[Bioperl-l] SearchIO Performance
Sendu Bala
bix at sendu.me.uk
Fri Mar 21 23:17:59 UTC 2008
Jason Stajich wrote:
>
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
>
>> Hi. I am pretty new to BioPerl, and have a question about performance
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to
>> be a dramatic difference, and that kind of time difference becomes
>> significant when I have to parse 30 Blast files in a row. I understand
>> that SearchIO is parsing the entire file and storing it all for easy
>> retrieval later, and maybe this time penalty is what I have to pay for
>> that convenience and organization.
[...]
> Sendu has written a pull parser that
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that
> created lightweight objects (or just hashes) instead this would also
> provide a substantial speedup.
Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the
format to 'blast_pull'. Depending on the cirumstance and thoughtful
usage, you can see orders of magnitude speed up.
http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html
The only disadvantage to the normal parser is that the pull parser
currently only supports NCBI BLASTN and BLASTP.
More information about the Bioperl-l
mailing list