[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc

Peter (BioPython Dev) biopython-dev at maubp.freeserve.co.uk
Wed Aug 16 16:05:12 UTC 2006


Albert Krewinkel wrote:
>>> The _parse_genbank_features function could also be used to parse embl
>>> or ddjb features, therefore I think it should be named differently.

Peter wrote:
>> First of all, that bit of code is for a new feature which I personally
>> wanted - to be able to iterate over CDS features in a genbank file.
>>
>> But yes, I did have in mind that it (and the GenBank parser) could be
>> re-used to deal with EMBL files.  I have not yet taken the time to
>> learn the EMBL file format and how it corresponds to the GenBank file
>> format - but I agree a lot of the code could be shared.

Albert Krewinkel wrote:
> I will try to build something similar for EMBL files within the next
> days.  This should be easy, since features really should look the same
> in both formates:
> 
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
> 

Oh - you meant just adding EMBL feature iteration.  I want thinking 
about the larger task of full EMBL file reading.

Doing just the features is very easy, here you go:

http://bugzilla.open-bio.org/show_bug.cgi?id=2059#c2

Any more feedback is very welcome.  Are you using the iterators 
directly, or via the helper function File2SequenceIterator?

Are you using just the sequence iterators, or the dictionary and list 
versions too?

Peter



More information about the Biopython-dev mailing list