[Bioperl-l] Not catching an error in EUtilities

Mon Oct 29 15:42:43 UTC 2007

On Oct 28, 2007, at 11:03 PM, Warren Gallin wrote:

> I've been having an intermittent problem, when the NCBI service that
> is accessed by EUtilities (in particular efetch) is not available.
>
> I have the calls enclosed in eval statements, but an exception is not
> thrown.  Instead I find that the  file that should contain the
> results of the efetch only has the following text:
>
> Error: The resource is temporarily unavailable
>
> So it appears that this is not generating an exception (maybe that is
> not desireable in general, but it would be useful in my case).
>
> The result is that my script tries to access the file using
> Bio::Seq::IO, and the expected text is not there.
>
> The relevant snippet of code and the output are copied below.
>
> The only way that I can think of to catch this outcome is to open the
> file and check the firt three lines for this text, and then go back
> and redo the efetch if this particular text is found.
>
> Is this behaviour considered a bug, or just an outcome that needs to
> be checked for when code is written using efetch?
...
> If so, is there a standard way of checking for this?

I may have already mentioned this (it is mentioned in the POD), but  
it's worth repeating: if you running a script to post more than 100  
requests you should be running it btwn 9pm and 5am ET per NCBI's rules:

http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

I have built in a way to pass in an LWP::UserAgent-related callback  
to get_Response() using the -cb parameter, primarily to allow piping  
data to a child process for instance.  This could also be used for  
checking the initial data retrieved and throwing if an error is  
returned.  I haven't extensively tested this yet; I can try running a  
stress test to see if I can trigger an efetch error and catch it in  
the callback.  If I can get something working I'll post it later  
today and may incorporate it into EUtilities in CVS.

A bit of background: efetch is the only eutil that doesn't have a  
specific parser (Bio::Tools::EUtilities) attached to it, primarily b/ 
c the data retrieved is very diverse (seq/pubmed/snp/etc data in XML,  
asn.1, text, HTML).  All other eutils besides efetch generate error  
codes via the HTTP::Response header (the norm) or in the XML  
returned; both of the previous types are errors that EUtilities  
catches and throws, so an eval{} works.  efetch errors seen to be  
atypical and may be related to the server load or specific database  
availability.

chris

> Thanks,
>
> Warren Gallin
>
> [Code that successfully executes an epost and retrieves the history]
>
>    RETRIEVE_LIST: eval {$prot_eutil->reset_parameters(
>          -eutil   => 'efetch',
>          -rettype => 'genbank',
>          -db      => 'protein',
>          -history => $history
>      );};
>      if ($@){
>      	print "efetch error trapped\n$@\n";
>      	goto RETRIEVE_LIST;
>
>      }
>
>      $file1 = ">" . $file1;
>      $retry = 0;
>      eval { $prot_eutil->get_Response( -file => $file1 ); };
>      if ($@) {
>          die "Server error: $@.  Try again later" if $retry == 5;
>          print STDERR "$@\n";
>          print STDERR "Server error, redo #$retry\n";
>          $retry++;
>          sleep(5);
>          goto RETRIEVE_LIST;
>      }
>      else {
>          print "efetch ran on $loop_bottom through $loop_top.\n";
>      }
>
> The output to the terminal from this part of the code is:
>
> efetch ran on 0 through 300.