[Bioperl-l] how to prevent forced exit?
Chris Fields
cjfields at illinois.edu
Tue Mar 15 14:44:23 UTC 2011
Ross,
Hope you're exaggerating, you really shouldn't use this service for retrieving 1 million records as you'll likely find your IP banned by NCBI; they are starting to enforce stricter web-based access to their server now. Bio::DB::GenBank uses a GET HTTP request using URI-based parameters which effectively limits the length of the query to around 200-300 IDs per request, so you would have to split single one large request many. Thousands of repeated requests, even with a timeout, may flag your IP as 'spam'. You can use something like Bio::DB::EUtilities to grab larger groups of seqs (~1000 IDs) b/c the latest EUtilities uses POST requests vs GET for a large number of IDs, but you are still effectively limited by the number of requests.
Frankly, there are much better/faster ways to do this, not least of which is to just download a GenBank section and parse it directly, or use a BLAST-formatted database and fastacmd to get the seqs of interest in FASTA format. Any reason why you are not doing this?
chris
On Mar 15, 2011, at 9:16 AM, Ross KK Leung wrote:
> While the complete code is as follows, the real problem is that the get_Stream_by_acc cannot be used repeatedly, such that when I'm feeding a list of accession numbers (e.g. 1 million records) to the perl script, the program will exit with code 255 (likely equivalent to -1). I wonder anybody had encountered this similar problem and has solved it accordingly...
>
>
>
>
>
> #!/usr/bin/perl use warnings;
>
> use Bio::DB::GenBank;
>
>
>
> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>'Fasta');
>
>
> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
>
>
> print "HEELO";
>
> while ($seqobj = $allseqobj->next_seq) {
> #$seqobj = $allseqobj->next_seq;
> $seq=$seqobj->seq;
> }
> print "222 HEELO";
>
>
>
>
>
>
>
>
> From: Dave Messina [mailto:David.Messina at sbc.su.se]
> Sent: 2011年3月15日 17:02
> To: Ross KK Leung
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to prevent forced exit?
>
>
>
> Hi Ross,
>
>
>
> Your code is incomplete and you didn't provide the output from running it, so it's not easy to figure out where you're going wrong.
>
>
>
> Try copying the example code directly from here
>
>
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/GenBank.html
>
>
>
> and making sure that works first before modifying it.
>
>
>
>
>
> More documentation and examples here:
>
> http://www.bioperl.org/wiki/HOWTO:Beginners
>
> http://www.bioperl.org/wiki/Bioperl_scripts
>
>
>
>
>
> Dave
>
>
>
>
>
>
>
> On Tue, Mar 15, 2011 at 06:54, Ross KK Leung <ross at cuhk.edu.hk> wrote:
>
> $gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile', -format =>
> 'Fasta');
> $allseqobj = $gb->get_Stream_by_acc("A3ZI37");
> l
> print "HEELO";
> while ($seqobj = $allseqobj->next_seq) {
> #$seqobj = $allseqobj->next_seq;
>
> $seq=$seqobj->seq;
>
> }
>
> print "222 HEELO";
>
>
>
> I find that the 1st HEELO can be printed while the 2nd one can't. Google
> does not return checking success/failure or null/exist of the Seq Object. As
> the 1st HEELO can be executed, so no throw/exception occurs for the
> get_Stream_by_acc. So what can I do? The real case is not hard-coding this
> A3ZI37 but reading a file that may contain a lot of these "illegitimate"
> accession numbers.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list