[Bioperl-l] how to prevent forced exit?

Jim Hu jimhu at tamu.edu
Tue Mar 15 15:25:36 UTC 2011


Hi Chris,

A version of this admonition should be on every wiki HOWTO that involves retrieving records from external sources, and in the docs for the relevant modules.  Speaking as someone who has used BioPerl intermittently for years, and who has Sysiphus-like relationship with the learning curve, I think the docs could use more discussion of when to use particular modules in addition to the details of how to use them provided in the perldocs. I realize this is hard, given the perl "more than one way to do it" world view, but that's my $0.02.

Since BioPerl.org is a wiki, I suppose I should do that admonition edit myself... especially since I already know the wiki markup to transclude the same text into multiple pages.

Jim

Sent from my iPad

On Mar 15, 2011, at 9:44 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Ross,
> 
> Hope you're exaggerating, you really shouldn't use this service for retrieving 1 million records as you'll likely find your IP banned by NCBI; they are starting to enforce stricter web-based access to their  server now.  Bio::DB::GenBank uses a GET HTTP request using URI-based parameters which effectively limits the length of the query to around 200-300 IDs per request, so you would have to split single one large request many.  Thousands of repeated requests, even with a timeout, may flag your IP as 'spam'.  You can use something like Bio::DB::EUtilities to grab larger groups of seqs (~1000 IDs) b/c the latest EUtilities uses POST requests vs GET for a large number of IDs, but you are still effectively limited by the number of requests.
> 
> Frankly, there are much better/faster ways to do this, not least of which is to just download a GenBank section and parse it directly, or use a BLAST-formatted database and fastacmd to get the seqs of interest in FASTA format.  Any reason why you are not doing this?
> 
> chris
> 
> On Mar 15, 2011, at 9:16 AM, Ross KK Leung wrote:
> 
>> While the complete code is as follows, the real problem is that the get_Stream_by_acc cannot be used repeatedly, such that when I'm feeding a list of accession numbers (e.g. 1 million records) to the perl script, the program will exit with code 255 (likely equivalent to -1). I wonder anybody had encountered this similar problem and has solved it accordingly...
>> 
>> 
>> 
>> 
>> 
>> #!/usr/bin/perl                                                                                                                 use warnings;                                                                                                                                                                                                                                                   
>> 
>> use Bio::DB::GenBank;
>> 
>> 
>> 
>> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>'Fasta');
>> 
>> 
>> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
>> 
>> 
>> print "HEELO";
>> 
>> while ($seqobj = $allseqobj->next_seq) {
>>                      #$seqobj = $allseqobj->next_seq;
>>                      $seq=$seqobj->seq;
>> }
>> print "222   HEELO";
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: Dave Messina [mailto:David.Messina at sbc.su.se] 
>> Sent: 2011年3月15日 17:02
>> To: Ross KK Leung
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to prevent forced exit?
>> 
>> 
>> 
>> Hi Ross,
>> 
>> 
>> 
>> Your code is incomplete and you didn't provide the output from running it, so it's not easy to figure out where you're going wrong.
>> 
>> 
>> 
>> Try copying the example code directly from here
>> 
>> 
>> 
>>   http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/GenBank.html
>> 
>> 
>> 
>> and making sure that works first before modifying it.
>> 
>> 
>> 
>> 
>> 
>> More documentation and examples here:
>> 
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> http://www.bioperl.org/wiki/Bioperl_scripts
>> 
>> 
>> 
>> 
>> 
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Mar 15, 2011 at 06:54, Ross KK Leung <ross at cuhk.edu.hk> wrote:
>> 
>> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>
>> 'Fasta');
>> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
>>              l
>> print "HEELO";
>> while ($seqobj = $allseqobj->next_seq) {
>>                      #$seqobj = $allseqobj->next_seq;
>> 
>>                      $seq=$seqobj->seq;
>> 
>>                      }
>> 
>>                  print "222   HEELO";
>> 
>> 
>> 
>> I find that the 1st HEELO can be printed while the 2nd one can't. Google
>> does not return checking success/failure or null/exist of the Seq Object. As
>> the 1st HEELO can be executed, so no throw/exception occurs for the
>> get_Stream_by_acc. So what can I do? The real case is not hard-coding this
>> A3ZI37 but reading a file that may contain a lot of these "illegitimate"
>> accession numbers.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list