[Biojava-dev] blast parsing slowness
Doug Rusch
drusch@tcag.org
Fri, 6 Dec 2002 16:20:38 -0500
So I have tested the fix and it seems good. Now the parsing is much faster and most of the time is spent handling the regex's.
Thanks alot!
Doug
-----Original Message-----
From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
Sent: Wed 12/4/02 6:17 PM
To: Doug Rusch
Cc: biojava-dev@biojava.org
Subject: Re: [Biojava-dev] blast parsing slowness
Doug,
Could you try again now? Thomas has committed a fix to the event
meta-data. We'd kind of mucked some of the plumbing up.
Matthew
Doug Rusch wrote:
> This is a good topic for consideration with BioJava2.
>
> The circumstances are this: I have my blast parser working in my personal experimental biojava package. The blast data I am parsing was generated by blasting 1 mb human genomic chunks against small sequences (basically ests), so 1 query many different subjects. Anyways, I did comparisons of the java code against a home brewed perl blast parser. The biojava was much slower (at least an order of magnatitude slower) than the perl code. Now this isnt quite a fair test because the design of the two parsers is completely different but if anything I would still expect Java to be faster than perl.
>
> I profiled the code and found that the vast majority of the processing time was being spent in org.biojava.utils.ChangeSupport.growIfNecessary. Everytime it creates an alignment (org.biojava.bio.program.ssbind.BlastLikeSearchBuilder.makeSubHit) it is adding a changeListener to the generic alphabet (org.biojava.bio.symbol.SimpleSymbolList.addListener) it is using for alignments. Obviously it is adding many thousands of change listeners to the alphabet, but to add insult to injury, the listeners are all ALWAYS_VETO. So this poor alphabet has thousands of listeners telling it not to change.
>
> Is this really what was intended? I get the impression that the ALWAYS_VETO changeListener is a special case. Perhaps ALWAYS_VETO listeners should just be kept track of by a counter? Should alphabets be changable at all? I do not know what use cases prompted this design but is there any concensus on a fix?
>
> Doug Rusch
> drusch@tcag.org
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com