[emboss-dev] Mapping feature types to Sequence Ontology (SO)
Peter Cock
p.j.a.cock at googlemail.com
Wed Aug 17 15:48:32 UTC 2011
On Wed, Aug 17, 2011 at 4:38 PM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 16/08/2011 16:36, Peter Cock wrote:
>>
>> Interestingly EMBOSS includes the sequence at the bottom
>> (using the FASTA directive) and has generated unique ID tags
>> for each feature. It has also added more note tags.
>
> The sequence is included if you are writing sequence data. GFF3 allows
> sequence to be included, so we add it. Using a separate feature file is
> always awkward for users, but is supported.
See also the discussion today on gmod-gbrowse / song-devel where
it sounds like GFF3 should have a single block of FASTA embedded
sequence at the end of the fine, rather than interleaved. As I suggest
on that thread, the practical solution for EMBOSS seqret might be to
omit the FASTA sequence altogether. Or cache them in memory/on
disk to write out at the very end of the all the features?
http://generic-model-organism-system-database.450254.n5.nabble.com/Mailing-list-for-GFF3-specification-discussion-td4707740.html
>> Unfortunately this also failed the GFF3 validation. The EMBOSS
>> output does a lot better (e.g. "cleaved_initiator_methionine" is
>> valid while "Initiator methionine" in the UniProt file was not)
>>
>> However, some of the terms in column 3 are apparently out of
>> date - but http://www.sequenceontology.org does list them as
>> synonyms:
>
> Thanks. I'll update the table, but synonyms should be acceptable.
I can see plus points for either view, certainly the validator could
downgrade that error to an warning.
>> Finally protein_modification_categorized_by_chemical_process
>> does not seem to be valid (I failed to find it in the ontology).
>
> Not in SO, but in a separate ontology (MOD). Should also be valid
> in GFF I believe, but perhaps the parser insists on using SO and
> excluding related ontologies.
OK, but in that case shouldn't you then be declaring this with a
##feature-ontology directive?
>> Additionally the validator complained about some of the note
>> in Line 15, probably due to the %3B escaped semi-colon,
>> but that may be a bug in the validator.
>
> Interesting. Let me know if we are not escaping the right characters, but I
> believe we are supposed to escape ';' in those positions.
I haven't checked this aspect carefully (since this is fiddly).
Peter
More information about the emboss-dev
mailing list