[emboss-dev] Mapping feature types to Sequence Ontology (SO)

Peter Rice pmr at ebi.ac.uk
Wed Aug 17 15:38:23 UTC 2011

On 16/08/2011 16:36, Peter Cock wrote:
> Interestingly EMBOSS includes the sequence at the bottom
> (using the FASTA directive) and has generated unique ID tags
> for each feature. It has also added more note tags.

The sequence is included if you are writing sequence data. GFF3 allows 
sequence to be included, so we add it. Using a separate feature file is 
always awkward for users, but is supported.

> Unfortunately this also failed the GFF3 validation. The EMBOSS
> output does a lot better (e.g. "cleaved_initiator_methionine" is
> valid while "Initiator methionine" in the UniProt file was not)
> However, some of the terms in column 3 are apparently out of
> date - but http://www.sequenceontology.org does list them as
> synonyms:

Thanks. I'll update the table, but synonyms should be acceptable.

> Finally protein_modification_categorized_by_chemical_process
> does not seem to be valid (I failed to find it in the ontology).

Not in SO, but in a separate ontology (MOD). Should also be valid in GFF 
I believe, but perhaps the parser insists on using SO and excluding 
related ontologies.

> Additionally the validator complained about some of the note
> in Line 15, probably due to the %3B escaped semi-colon,
> but that may be a bug in the validator.

Interesting. Let me know if we are not escaping the right characters, 
but I believe we are supposed to escape ';' in those positions.


Peter Rice

More information about the emboss-dev mailing list