[emboss-dev] Mapping feature types to Sequence Ontology (SO)

Peter Cock p.j.a.cock at googlemail.com
Wed Aug 17 15:48:32 UTC 2011


On Wed, Aug 17, 2011 at 4:38 PM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 16/08/2011 16:36, Peter Cock wrote:
>>
>> Interestingly EMBOSS includes the sequence at the bottom
>> (using the FASTA directive) and has generated unique ID tags
>> for each feature. It has also added more note tags.
>
> The sequence is included if you are writing sequence data. GFF3 allows
> sequence to be included, so we add it. Using a separate feature file is
> always awkward for users, but is supported.

See also the discussion today on gmod-gbrowse / song-devel where
it sounds like GFF3 should have a single block of FASTA embedded
sequence at the end of the fine, rather than interleaved. As I suggest
on that thread, the practical solution for EMBOSS seqret might be to
omit the FASTA sequence altogether. Or cache them in memory/on
disk to write out at the very end of the all the features?

http://generic-model-organism-system-database.450254.n5.nabble.com/Mailing-list-for-GFF3-specification-discussion-td4707740.html

>> Unfortunately this also failed the GFF3 validation. The EMBOSS
>> output does a lot better (e.g. "cleaved_initiator_methionine" is
>> valid while "Initiator methionine" in the UniProt file was not)
>>
>> However, some of the terms in column 3 are apparently out of
>> date - but http://www.sequenceontology.org does list them as
>> synonyms:
>
> Thanks. I'll update the table, but synonyms should be acceptable.

I can see plus points for either view, certainly the validator could
downgrade that error to an warning.

>> Finally protein_modification_categorized_by_chemical_process
>> does not seem to be valid (I failed to find it in the ontology).
>
> Not in SO, but in a separate ontology (MOD). Should also be valid
> in GFF I believe, but perhaps the parser insists on using SO and
> excluding related ontologies.

OK, but in that case shouldn't you then be declaring this with a
##feature-ontology directive?

>> Additionally the validator complained about some of the note
>> in Line 15, probably due to the %3B escaped semi-colon,
>> but that may be a bug in the validator.
>
> Interesting. Let me know if we are not escaping the right characters, but I
> believe we are supposed to escape ';' in those positions.

I haven't checked this aspect carefully (since this is fiddly).

Peter



More information about the emboss-dev mailing list