[Bioperl-l] SearchIO speed up

Chris Fields cjfields at uiuc.edu
Fri Aug 11 13:48:59 UTC 2006


> Anyway, in your spare time, maybe you do similar speedups for other
> pieces of Bioperl? My personal favorite would be the GenBank/EMBL
> parsers. The fungal genome ORF files I'm working with are only 20M or
> so, but using Bioperl to work with them takes so much longer than with
> non-Bioperl on the 6M FASTA files for other genomes. I have to imagine
> it's mostly creating objects for the gazillion tags, 90% of which I
> never peek at.

I agree completely.  Swissknife (lazy parsing of Swiss-Prot) was mentioned
here yesterday.  We could use something similar for GenBank/EMBL.  The code
for Swissknife was quite extensive but, really, so is SeqIO::genbank!  

I also wanted to see how much using bioperl's _readline() method slows
things down (my guess is not too dramatically, but for 20 MB files it may be
a problem).

> I know, you folks are busy, and I should be volunteering to do it
> myself. But you can at least consider it a user request.

We can't promise anything!  If you want, add a bit to the Bioperl release
page:

http://www.bioperl.org/wiki/Bioperl_Release

I would hold that request off until post-1.6.  Lots of other priorities
pooping up.

Chris

> - Amir Karger
> Research Computing
> Bauer Center for Genomics Research
> Harvard University

...




More information about the Bioperl-l mailing list