[emboss-dev] Mapping feature types to Sequence Ontology (SO)

Peter Rice pmr at ebi.ac.uk
Wed Aug 17 15:55:53 UTC 2011


On 17/08/2011 16:48, Peter Cock wrote:
> On Wed, Aug 17, 2011 at 4:38 PM, Peter Rice<pmr at ebi.ac.uk>  wrote:
>> On 16/08/2011 16:36, Peter Cock wrote:
>>>
>>> Interestingly EMBOSS includes the sequence at the bottom
>>> (using the FASTA directive) and has generated unique ID tags
>>> for each feature. It has also added more note tags.
>>
>> The sequence is included if you are writing sequence data. GFF3 allows
>> sequence to be included, so we add it. Using a separate feature file is
>> always awkward for users, but is supported.
>
> See also the discussion today on gmod-gbrowse / song-devel where
> it sounds like GFF3 should have a single block of FASTA embedded
> sequence at the end of the fine, rather than interleaved. As I suggest
> on that thread, the practical solution for EMBOSS seqret might be to
> omit the FASTA sequence altogether. Or cache them in memory/on
> disk to write out at the very end of the all the features?

Thanks. We already save sequences and write at the end for some formats 
so I'll add it for GFF3. We will need more work for reading GFF3 input 
though, but it may not be too bad.

If we are reading it as feature input, we don't look for the sequence.

If we are reading as sequence input, we need to read all the sequeces 
into memory and then go back to read the features. For streamed input we 
can buffer to make the rewind work.

regards,

Peter Rice
EMBOSS Team



More information about the emboss-dev mailing list