[Bioperl-l] Question about embl format

Hilmar Lapp hlapp at gnf.org
Sun Apr 20 10:52:41 EDT 2003


On Saturday, April 19, 2003, at 10:56  AM, Ewan Birney wrote:

>
>
> On Fri, 18 Apr 2003, Lincoln Stein wrote:
>
>> The SO (sequence ontology) terms tend to be very long, although the 
>> most
>> common ones have short synonyms that often (but not always) match the
>> GenBank/EMBL feature table tags.  What I *could* do is to replace the 
>> SO type
>> tags with their accession numbers (SO:XXXXXX) and place the full name 
>> in a
>> qualifiers /note as you suggest.
>
> I suspect we should be very smart in the system with logic as follows:
>
>
>   - if there is a shortname, use that
>
>   - if not, use the SO-identifier, potential with this * prefix which 
> in
> the docs indicate how to put in user defined case
>
>   - alternatively, we walk back up the SO tree untill we hit a 
> shortname
> (? EMBL ok shortname) which we can use
>

By doing this you'd supposedly not be misstating the type by you'd be 
missing information. Not very nice.

>   - in all cases we put a
>
>    /note="SO-term=SOxxxxxxx"
>    /note="SO-descriptions=long description of SO term"
>

We should use if possible an existing convention (standard) for 
identifying the attribute and its resource. Sounds like LSID could 
help, but I'm not enough into the matter to judge this. I just think we 
shouldn't invent our own convention here.


>
> [...]
>> This will make a deep change in the API where the primary_tag could 
>> be an
>> ontology term object rather than a string.  The best way to ensure 
>> backward
>> compatibility with other people's codes would be to override the 
>> string
>> method in the ontology term object in order to produce the term label.
>>
>> Or we could reserve this type of change to bioperl 2.
>>
>
> I think we should look at doing this now in 1.3 - the more magic we can
> build in to make a SO a reality to use the more SO will become a
> reality...
>
>
>
> I might have a look at this. Do we have a SO-lite checked into
> bioperl-live to work with?

The entire SOFA is checked in and used for testing purposes in 
t/data/sofa.ontology. At the time I checked it in there was no term 
definitions file available from song.sf.net.

BTW this is also the ontology I use for testing in bioperl-db. The 
tests in t/ontology.t load the entire SOFA (well, it's not that huge :) 
with relationships, re-retrieves it, and tests a couple of connections. 
It also establishes the transitive closure and tests it. All pass :)

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list