[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc

Peter (BioPython Dev) biopython-dev at maubp.freeserve.co.uk
Mon Jul 31 17:41:49 UTC 2006


Peter wrote:
>>In the short term maybe we should just replace the internals of the
>>current Bio.Fasta module with a pure python implementation like  
>>that in Bio.SeqIO.FASTA - good idea?  Bad idea?

Marc wrote:
> I would keep them separate but change the documentation on the how-to  
> site to point to using the Bio.SeqIO.FASTA since that is where I  
> think we want people to start going. The code change to Bio.Fasta  
> should be to add a depreciation warning.

Certainly long term we could do that.  There may be advantages to the 
current very flexible Bio.Fasta code that the SeqIO replacement may not 
offer (e.g. if we focus on just parsing into SeqRecords).

Short Term
----------
Right now I guess most people dealing with Fasta files will be using 
Bio.Fasta, and it is very slow, hence bug 2058:

http://bugzilla.open-bio.org/show_bug.cgi?id=2058

My patch makes Bio.Fasta almost as fast as Bio.SeqIO.FASTA according to 
my tests (modest sized files).

If any of you could try this patch on your machines - on the off chance 
that it causes problems for any existing code.  It does pass 
test_Fasta.py and test_Fasta2.py on Windows at least.

Medium/Long Term
----------------
We need to sort out what to do with Bio.SeqIO as currently the existing 
code in Bio/SeqIO/generic.py and Bio/SeqIO/FASTA.py uses different 
interfaces.  But do agree that something like that should be OK.

I have been working on a possible replacement (but it doesn't seem to 
have made it to the mailing list yet - must check my recent email).

Peter



More information about the Biopython-dev mailing list