[Biojava-l] Stop condition for blast parser

Marcel Huntemann marcel.huntemann at gmail.com
Thu Mar 12 03:00:38 UTC 2009


Hi Mark!

The blast etc. is parallelized. The contigs are split into groups of 1000
and I also modified my program in the way that it works now with all those
separate files. But nevertheless I also have a program that works on the
concatenated blast output. The parser with my customized handler is always
looking for the results of a certain contig and then compares these
results to something else and also does some other stuff in-between to
calculate some statistics and then creates a new parser again to get the
results for the next contig. So a System.exit() is not an option, since it
would stop my whole program (in which I am using the parser). I also don't
wanna start working with threads here. I was just hoping that there would
be a way to tell the handler that, when a certain condition is met, it
should give the parser a signal to stop parsing (and maybe even to reset
itself to the first line). But I guess there's no way to do it in the
customized handler...

Thanks,
Marcel


mark.schreiber at novartis.com wrote:
> 
> Hi -
> 
> There are many ways to stop the parsing but it really depends on how you
> have set the program up.  Notably there is no way for the Blast parsing
> system of BioJava to shut itself down but control probably shouldn't
> happen at that level.
> 
> A crude but effective procedure is to write out the results when you
> find the hit of interest and then simply call System.exit()
> 
> Another approach would be to spawn Tasks to parse each record and then
> have them signal to the main thread when they are complete to shut them
> down.  If you are using Java 1.5 or earlier then you would need to do
> this with Threads. If you have a later version you can use the
> concurrent packages which are much nicer to deal with.
> 
> One thing I don't understand is why you don't blast each contig
> separately, in that case the results would only contain your hit of
> interest.  That means 90K separate blasts but there are versions of
> blast that run on clusters and the database (3 million genes) is not
> huge so it should be an embarrassingly parallel problem?
> 
> - Mark
> 
> biojava-l-bounces at lists.open-bio.org wrote on 03/10/2009 03:00:36 AM:
> 
>> Hi Mark!
>>
>> Mark Schreiber wrote:
>> > You could just customize BlastEcho to pass on the events of interest,
>> > ignore those that are not interesting.
>> That's what I am doing right now. But I don't know, how to tell my
>> customized BlastEcho to stop, when a certain condition is met during a
>> paricular event call. What's the command for stopping there?
>>
>> > It could also exit if a certain
>> > event occurs.
>> How?
>>
>> > Remember it cost almost nothing to read the file so you
>> > save time by only sending interesting events for parsing.
>> Hmm, I am not sure, if it's really almost nothing, when I've about 90,000
>> contigs that were blasted against a database with about maybe 3,000,000
>> genes. The blast output that I am parsing is about 13Gig big and every
>> cycle I am looking for the results of one particular contig of these
>> 90,000 contigs. So I definitely experienced that the time sums up a lot,
>> when it's running in each of these 90,000 cycles over the whole file,
>> although the contig I am looking for was already at the beginning
> ofthe file.
>>
>>
>> Cheers,
>> Marcel



More information about the Biojava-l mailing list