[emboss-dev] Mapping feature types to Sequence Ontology (SO)

Peter Rice pmr at ebi.ac.uk
Thu Aug 18 12:28:28 UTC 2011


On 08/16/2011 04:36 PM, Peter Cock wrote:
> I will report this to UniProt later. However, first I thought
> I would try converting one of the other files provided into
> GFF3 using EMBOSS seqret for an alternative, e.g. the
> plain text "swiss" format: http://www.uniprot.org/uniprot/P99999.txt
> 
> I can convert this using seqret as follows:
> 
> ========================================
> $ seqret -feature -osformat=gff3 -sformat=swiss -sequence P99999.txt

> However, some of the terms in column 3 are apparently out of
> date - but http://www.sequenceontology.org does list them as
> synonyms:
> 
> It looks like the EMBOSS sequence ontology table may need
> updating for at least these three cases.
> 
> Finally protein_modification_categorized_by_chemical_process
> does not seem to be valid (I failed to find it in the ontology).

That was a name from the MOD ontology. GFF3 output now uses an SO term
(but SO is lacking detail for MOD_RES, having only:

id: SO:0001089
name: post_translationally_modified_region

and

id: SO:0001700
name: histone_modification

... and then more descendant of histone modification. Still showing its
DNA_only roots.

EMBOSS internally uses MOD terms for MOD_RES features. The details are
in the note tag in GFF3 output.

> Additionally the validator complained about some of the note
> in Line 15, probably due to the %3B escaped semi-colon,
> but that may be a bug in the validator.

Worked for me. Perhaps it was confused by the term name errors (or
perhaps the validator has been fixed)

However, one nasty bug ... EMBOSS was so careful to only read real GFF3
format that the EMBOSS comment "#!Type Protein" was ignored and features
were read into EMBOSS as nucleotide.

I suspect there is no way in GFF3 to identify a protein file. In the
next patch we can parse the EMBOSS comment again but that will not help
with non-EMBOSS protein GFF3 files.

Is there some official distinction between protein and nucleotide GFF3
files?

regards,

Peter Rice
EMBOSS Team



More information about the emboss-dev mailing list