[Biopython-dev] Performance of Bio.File.UndoHandle

Jeffrey Chang jchang at jeffchang.com
Fri Oct 17 15:03:10 EDT 2003


On Thursday, October 16, 2003, at 05:45  AM, Michael Hoffman wrote:

> On Wed, 15 Oct 2003, Jeffrey Chang wrote:
>
>> That is a nice implementation.  However, Biopython already has at 
>> least
>> 3 Fasta parsers!
>>    Bio/Fasta
>>    Bio/SeqIO/FASTA
>>    Bio/expressions/fasta
>
> There sure are. We should probably be cutting them rather than adding
> them I suppose. :-) Have you thought of deprecating Bio.Fasta since it
> is the slowest?

Yes, that will probably be done eventually.   However, it does have a 
nice interface that's consistent with the other parsers, e.g. for 
GenBank, and it's documented.  We'd be deprecating the best documented 
parser for faster ones that aren't documented.  (As you noticed, not 
even docstrings.)  It's trade-off.  The decision would be much clearer 
if the other parsers had better documentation!  ;)



> I know that the official path is to get people towards FormatIO but
> Bio.expressions.fasta is more than 12x slower than my
> implementation/Bio.SeqIO.FASTA (comparable as you predicted)! For one
> test:
>
> FormatIO: 3.085s/3.094s/3.154s
> LightIterator: 0.246s/0.243s/0.245s

Yikes!  Your code is correct.  However, in fairness, the fasta parser 
that FormatIO is doing more work, such as trying to detect database IDs 
(GenBank, EMBL, DDBJ, NBRF) in the description line.  However, if 
that's something that's not generally needed, perhaps that 
functionality should be off by default, so that the parser would be 
faster.  Everybody likes that, right?

Jeff




More information about the Biopython-dev mailing list