[emboss-dev] Mapping feature types to Sequence Ontology (SO)

Peter Cock p.j.a.cock at googlemail.com
Tue Aug 16 15:03:26 UTC 2011

Dear Peter R. (et al.),

I recall from one of our chats in person that EMBOSS has some
mapping tables to convert the various different data file format's
feature names into a common standard (the Sequence Ontology?),
for the purpose of inter-converting files. e.g. Converting a UniProt/
SwissProt plain text protein file into a GenPept protein file or GFF3

Is that a fair summary?

It seems to match the minutes of this meeting (found with
Google) http://emboss.sourceforge.net/meetings/2009-02-16.html

> DASGFF requires a sequence ontology (or BioSapiens
> ontology) tag for protein features. Peter has updated the
> Efeatures definitions for proteins to use GFF3 sequence
> ontology codes as internal identifiers, and to use GFF3
> as the principle definitions for all protein features. All
> SwissProt feature types (36 in the current Swissprot
> release) are also defined with the closest possible match
> to the sequence ontology. Where there is no exact match,
> an EMBOSS internal type is defined using the closets SO
> code and the original feature type as a suffix. For SwissProt
> output this is converted back to the swissprot feature type.
> For GFF3 output the internal type is an alias for the closest
> (more general) SO term.

Can you point me at these mapping tables in the EMBOSS
source code please?

I'm particularly interested in the SwissProt to SO mapping
right now.


Peter C.

More information about the emboss-dev mailing list