[Bioperl-l] SearchIO Performance
Albion Baucom
baucom at msg.ucsf.edu
Fri Mar 21 20:13:00 UTC 2008
Hi. I am pretty new to BioPerl, and have a question about performance
with regard to Blast (nucleotide) file parsing. My Blast result files
usually have close to 100 or more sequence hits. Each sequence is
about 1400 nucleotides long.
After profiling code I wrote, I find that calling the next_result()
function after creating a search object takes substantially longer
than non-OO, quick and dirty code I am using to parse the same Blast
files.
What is substantially longer? Well, the existing code takes about 0.25
seconds, and the BioPerl call takes about 4.5 seconds. I find that to
be a dramatic difference, and that kind of time difference becomes
significant when I have to parse 30 Blast files in a row. I understand
that SearchIO is parsing the entire file and storing it all for easy
retrieval later, and maybe this time penalty is what I have to pay for
that convenience and organization.
I am just wondering if there is anything other than writing custom
code based on BioPerl to speed this up. Something I might not be aware
of that I can do ahead of time, or during parsing, to limit what is
parsed, or facilitate the parsing process. For instance, is there a
way to "look ahead" and simply parse alignments that meet a specific
expectancy cutoff?
I confess I have not read the documentation thoroughly (although
obviously enough to make it do what I want), but am certainly willing
to do so if someone can point me in the right direction.
Thanks
Albion
More information about the Bioperl-l
mailing list