[Bioperl-l] Question about embl format
Ewan Birney
birney at ebi.ac.uk
Sat Apr 19 19:56:19 EDT 2003
On Fri, 18 Apr 2003, Lincoln Stein wrote:
> The SO (sequence ontology) terms tend to be very long, although the most
> common ones have short synonyms that often (but not always) match the
> GenBank/EMBL feature table tags. What I *could* do is to replace the SO type
> tags with their accession numbers (SO:XXXXXX) and place the full name in a
> qualifiers /note as you suggest.
I suspect we should be very smart in the system with logic as follows:
- if there is a shortname, use that
- if not, use the SO-identifier, potential with this * prefix which in
the docs indicate how to put in user defined case
- alternatively, we walk back up the SO tree untill we hit a shortname
(? EMBL ok shortname) which we can use
- in all cases we put a
/note="SO-term=SOxxxxxxx"
/note="SO-descriptions=long description of SO term"
Though perhaps the description is too much of a denormalisation.
- when we re-read EMBL/GenBank if we spot a /note="SO-term=SOxxxxxx"
that overrides all other magic for FT key--->SO mapping
>
> This will make a deep change in the API where the primary_tag could be an
> ontology term object rather than a string. The best way to ensure backward
> compatibility with other people's codes would be to override the string
> method in the ontology term object in order to produce the term label.
>
> Or we could reserve this type of change to bioperl 2.
>
I think we should look at doing this now in 1.3 - the more magic we can
build in to make a SO a reality to use the more SO will become a
reality...
I might have a look at this. Do we have a SO-lite checked into
bioperl-live to work with?
More information about the Bioperl-l
mailing list