[Bioperl-l] Remote blast fork errors / Process limit restrictions

Jason Stajich jason at bioperl.org
Mon Dec 7 21:24:54 UTC 2009


Robert -

You seem to be mixing the blast remote and the sequence query  
retrieval problems. These messages are related to the remote retrieval  
of sequences.
  It is hard to tell from your message specifically which modules you  
are using or how you are querying NCBI - there are several ways to do  
this either with the NCBI tools or the Bio::DB::GenBank.
  If you are using Bio::DB::Query::GenBank that allows for async  
access and has built in controls to adhere to the wait variant that  
NCBI requests but I don't think Bio::DB::GenBank get_Seq_by_acc method  
does any sort of thing (at least when it was originally written).

I always advocate if you want highly available and reliable access to  
sequences you should download the nr or whichever DB and use the local  
indexing tools for the retrieval.  Once you start doing hundreds of  
queries I don't see any good reason to be doing the query against NCBI  
directly given unreliabilities of the web and services. Local  
databases are faster and more reliable for most people so I urge you  
take advantage of the tools which provide local database access with  
the same APIs.


I would like to comment that the tone of your posts to the list are  
not particularly helpful.   I wonder if you are actually asking for  
help or just interested in complaining about when things don't work as  
you expect? This is a collaborative and volunteer-only project, with  
the principles of working together to make useful toolkit.  We  
encourage you to build programs and applications from this base that  
suit your needs, but not all things will be directly implemented in  
the toolkit if they aren't generic enough (at least that is my  
feeling, the other Core devs help with these decisions).
   If there is a useful, generic, and reusable part we would like that  
to be part of the API. Otherwise we suggest the new application that  
fits a developer's vision. We encourage you to write (and publish)  
that application separately, but certainly encourage bug (and fixes)  
submissions and also code contributions for new features where they  
can be seen as generally useful.

-jason
On Dec 7, 2009, at 12:41 PM, Robert Bradbury wrote:

> This comment could also have a subject line: "Why does Bioperl/ 
> get_sequence>
> fork at all!  Why are not all operations sequential?  And if this is a
> "default" mode that I'm unaware of -- How to I ever write a reliable  
> BioPerl
> script if I have little or no capability of what the program uses  
> when it
> runs?  I may have days so I can bear the burden of relatively slow  
> results
> (and so can use sequential processing rather than parallel).
>
> I've got a perl script that uses remote blast to blast a sequence  
> against a
> subset of the NCBI sequences.  It "mostly" works, in that it returns a
> seemingly complete .bls result file but when attempting to look at the
> sequences (so it can more accurately summarize the information from  
> the
> results than a standard blast report allows) it terminates  
> prematurely with
> errors.
>
> The error is:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Couldn't fork: Resource temporarily unavailable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::WebDBSeqI::_open_pipe
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:722
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:463
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/NCBIHelper.pm:479
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/WebDBSeqI.pm:186
> STACK: Bio::Perl::get_sequence
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Perl.pm:520
> STACK: main::acc_2_desc /home/bradbury/Genomes/bin/RB.pl:182
> STACK: /home/bradbury/Genomes/bin/RB.pl:155
> -----------------------------------------------------------
>
> The precise line (in my code) whcih appears to be generating the  
> error is:
>    $seq = get_sequence('GenBank', $accsn);
>
> Now this can be a problem if NCBI/Genbank fails due to load  
> conditions --
> but this specific failure (which is repeatable is due to most likely  
> hitting
> the user process limit restrictions) -- but the small blast results  
> work
> fine -- its only if the Blast has returned several hundred hits that  
> it runs
> into this problem.
>
> Now what it sounds like to me is an attempt to do multiple  
> asynchronous NCBI
> queries (to get a sequence) with complete disregard of the environment
> (process limits, NCBI limits, etc.).  But I do not know enough about  
> how
> this works to point a finger at some specific function.  As a result
> get_sequence process results are accumulated, summarized, etc.  
> without ever
> having issued to respect "wait-variant()) calls to collect former  
> children
> [This IMO would clearly be a bug.]
>
> It could be adjusted to by allowing the BioPerl library to run in 3  
> modes.
> (1) completely synchronous -- if you fork you wait until its done --  
> and
> you collect "it" and any fork fails then one either collects the  
> process or
> switches to the non-conservative mode.
>
> Robert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org




More information about the Bioperl-l mailing list