[Biopython-dev] EMBL flatfile parsing

Peter biopython-dev at maubp.freeserve.co.uk
Mon Feb 19 11:24:29 UTC 2007


Peter wrote:
>>> Does this sound like a sensible way to include EMBL support?
>>>
>>> ...
> 
> This took longer than I expected, but its done now.

Has anyone had a chance to try out the revised EMBL/GenBank parser?

I could ask on the main list, but as testing the EMBL parsing would 
require installing the CVS release (or updating just Bio/GenBank and 
Bio/SeqIO by hand) that seems a bit much to ask.

There are three main things I would like feedback on:

(a) Has any existing code using Bio.GenBank been affected at all.

(b) Does Bio.SeqIO read your favourite EMBL/GenBank files.

(c) How parsing the file as "genbank-cds" and "embl-cds" look?

i.e. This returns each CDS feature with its stated amino acid 
translation as a SeqRecord.  Does anyone else think getting that the 
genes themselves in this way is a useful option?  I'm not sure about the 
simplistic code to choose the SeqRecord id/name/description - this is 
difficult as there is a lot of variation in annotation conventions.

> I should probably add some EMBL examples to the SeqIO unit test...

I have added a single record EMBL file to the test suite.

Peter




More information about the Biopython-dev mailing list