[Bioperl-l] Question about embl format
Hilmar Lapp
hlapp at gnf.org
Sun Apr 20 10:52:41 EDT 2003
On Saturday, April 19, 2003, at 10:56 AM, Ewan Birney wrote:
>
>
> On Fri, 18 Apr 2003, Lincoln Stein wrote:
>
>> The SO (sequence ontology) terms tend to be very long, although the
>> most
>> common ones have short synonyms that often (but not always) match the
>> GenBank/EMBL feature table tags. What I *could* do is to replace the
>> SO type
>> tags with their accession numbers (SO:XXXXXX) and place the full name
>> in a
>> qualifiers /note as you suggest.
>
> I suspect we should be very smart in the system with logic as follows:
>
>
> - if there is a shortname, use that
>
> - if not, use the SO-identifier, potential with this * prefix which
> in
> the docs indicate how to put in user defined case
>
> - alternatively, we walk back up the SO tree untill we hit a
> shortname
> (? EMBL ok shortname) which we can use
>
By doing this you'd supposedly not be misstating the type by you'd be
missing information. Not very nice.
> - in all cases we put a
>
> /note="SO-term=SOxxxxxxx"
> /note="SO-descriptions=long description of SO term"
>
We should use if possible an existing convention (standard) for
identifying the attribute and its resource. Sounds like LSID could
help, but I'm not enough into the matter to judge this. I just think we
shouldn't invent our own convention here.
>
> [...]
>> This will make a deep change in the API where the primary_tag could
>> be an
>> ontology term object rather than a string. The best way to ensure
>> backward
>> compatibility with other people's codes would be to override the
>> string
>> method in the ontology term object in order to produce the term label.
>>
>> Or we could reserve this type of change to bioperl 2.
>>
>
> I think we should look at doing this now in 1.3 - the more magic we can
> build in to make a SO a reality to use the more SO will become a
> reality...
>
>
>
> I might have a look at this. Do we have a SO-lite checked into
> bioperl-live to work with?
The entire SOFA is checked in and used for testing purposes in
t/data/sofa.ontology. At the time I checked it in there was no term
definitions file available from song.sf.net.
BTW this is also the ontology I use for testing in bioperl-db. The
tests in t/ontology.t load the entire SOFA (well, it's not that huge :)
with relationships, re-retrieves it, and tests a couple of connections.
It also establishes the transitive closure and tests it. All pass :)
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list