[Biojava-l] Stop condition for blast parser

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Mar 10 02:36:50 UTC 2009


Hi -

There are many ways to stop the parsing but it really depends on how you 
have set the program up.  Notably there is no way for the Blast parsing 
system of BioJava to shut itself down but control probably shouldn't 
happen at that level.

A crude but effective procedure is to write out the results when you find 
the hit of interest and then simply call System.exit()

Another approach would be to spawn Tasks to parse each record and then 
have them signal to the main thread when they are complete to shut them 
down.  If you are using Java 1.5 or earlier then you would need to do this 
with Threads. If you have a later version you can use the concurrent 
packages which are much nicer to deal with.

One thing I don't understand is why you don't blast each contig 
separately, in that case the results would only contain your hit of 
interest.  That means 90K separate blasts but there are versions of blast 
that run on clusters and the database (3 million genes) is not huge so it 
should be an embarrassingly parallel problem?

- Mark

biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM:

> Hi Mark!
> 
> Mark Schreiber wrote:
> > You could just customize BlastEcho to pass on the events of interest,
> > ignore those that are not interesting. 
> That's what I am doing right now. But I don't know, how to tell my
> customized BlastEcho to stop, when a certain condition is met during a
> paricular event call. What's the command for stopping there?
> 
> > It could also exit if a certain
> > event occurs.
> How?
> 
> > Remember it cost almost nothing to read the file so you
> > save time by only sending interesting events for parsing.
> Hmm, I am not sure, if it's really almost nothing, when I've about 
90,000
> contigs that were blasted against a database with about maybe 3,000,000
> genes. The blast output that I am parsing is about 13Gig big and every
> cycle I am looking for the results of one particular contig of these
> 90,000 contigs. So I definitely experienced that the time sums up a lot,
> when it's running in each of these 90,000 cycles over the whole file,
> although the contig I am looking for was already at the beginning ofthe 
file.
> 
> 
> Cheers,
> Marcel
> 
> > 
> >     On 7 Mar 2009, 12:01 PM, "Marcel Huntemann"
> >     <marcel.huntemann at gmail.com <mailto:marcel.huntemann at gmail.com>> 
wrote:
> > 
> >     But where? I can't do it in my customized handler, can I?
> > 
> >     Mark Schreiber wrote: > Because the blast parser uses event based
> >     parsing you should be able to > c...
> > 
> >     > <marcel.huntemann at gmail.com <mailto:marcel.huntemann at gmail.com>
> >     <mailto:marcel.huntemann at gmail.com
> >     <mailto:marcel.huntemann at gmail.com>>> wrote: > > Hi! > > ...
> > 
> >     >     <mailto:Biojava-l at lists.open-bio.org
> >     <mailto:Biojava-l at lists.open-bio.org>>
> > 
> >     > http://lists.open-bio.org/mailman/listinfo/biojava-l >
> > 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.



More information about the Biojava-l mailing list