[Bioperl-l] Should be simple accession number script, but it's not.

Lincoln Stein lstein@cshl.org
Fri, 13 Dec 2002 19:15:56 -0500


It should be more stable in version 1.2 (or in the CVS "live" code).  However, 
NCBI's servers are acting up more than usual this week, and it might be bad 
in the 1.2 code as well.

Lincoln

On Friday 13 December 2002 04:48 pm, Jamie Hatfield wrote:
> 1.0.2 code.  It's fairly consistent (like 9/10 hits work).  I don't use
> it for anything production.  And you're right, users would be better
> incrementally download what they need.  We've done that for awhile.
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason@cgt.mc.duke.edu]
> Sent: Friday, December 13, 2002 1:59 PM
> To: Jamie Hatfield
> Cc: bioperl-l@bioperl.org
> Subject: RE: [Bioperl-l] Should be simple accession number script, but
> it's not.
>
>
> The hit or miss with many requests is what is to be expected from the
> ncbi server at this point - I'm not sure if you're using the latest
> code in CVS or the 1.0.2 code which uses a different web interface.
>
> The new Bio::DB::GenBank uses a different interface and attempts to wait
> between requests as that is the 'nice' thing to do according to NCBI's
> remote user protocol - All in all we can't do much better than what the
> webservice provides other than try to cache and retry things.
>
> NCBI expects that people who are doing lots of requests for accession
> numbers will find better success in downloading the sequence database
> files locally.  You can do this with our modules and index them with
> Bio::Index::Fasta or Bio::Index::GenBank depending on the format.
>
> -jason
>
> On Fri, 13 Dec 2002, Jamie Hatfield wrote:
> > Is '@anum = "<SEQ>";' a legal construct?  I've personally never seen
> > that before.  You can do '@anum = <SEQ>;', though, and it will work.
> > Or, just use the <SEQ> down in your loop,
> >
> > e.g.,
> > foreach my $accession ( <SEQ> ) {
> >
> > Should work also.  (or 'while (my $accession = <SEQ>)', etc.
>
> TMTOWTDI,
>
> > IMHO, YMMV)  :-)
> >
> > Sorry.  It's Friday.  Blame it on that.
> >
> > Ok, but in testing his script, I came up with the following:  I print
> > out the requested accession number, then print out the id returned
>
> from
>
> > the DB lookup IF it was successful.
> > 	my $seq = $db->get_Seq_by_acc($accession);
> > 	print "$accession : ";
> > 	print $seq->id() if ($seq);
> > 	print "\n";
> >
> > Funny thing is, two separate invocations, I get different
> > successes/failures!!
> >
> > Accession.txt
> > =============================
> > AP005198
> > AP005244
> > AP005246
> > AP005247
> > AP005252
> > AP005258
> > AP005292
> > AP005296
> > AP005301
> > =============================
> >
> > Testprog.pl
> > =============================
> > #!/usr/local/bin/perl -w
> >
> > use Bio::DB::GenBank;
> >
> >
> > open SEQ, "<accession.txt";
> > @anum = <SEQ>;
> >
> > my $db = new Bio::DB::GenBank;
> >
> > foreach my $accession ( @anum ) {
> > 	# just get things by accession number
> > 	chomp($accession);
> > 	my $seq = $db->get_Seq_by_acc($accession);
> > 	print "$accession : ";
> > 	print $seq->id() if ($seq);
> > 	print "\n";
> > 	#print $seq->seq();
> > }
> > =============================
> >
> > Output 1
> > =============================
> > AP005198 :
> > AP005244 : AP005244
> > AP005246 :
> > AP005247 : AP005247
> > AP005252 : AP005252
> > AP005258 : AP005258
> > AP005292 : AP005292
> > AP005296 :
> > AP005301 :
> > =============================
> >
> > Output 2
> > =============================
> > AP005198 : AP005198
> > AP005244 : AP005244
> > AP005246 : AP005246
> > AP005247 :
> > AP005252 : AP005252
> > AP005258 : AP005258
> > AP005292 : AP005292
> > AP005296 : AP005296
> > AP005301 :
> > =============================
> >
> > What's with that??
> >
> >
> > ----------------------------------------------------------------------
> > Jamie Hatfield                              Room 541H, Marley Building
> > Systems Programmer                          University of Arizona
> > Arizona Genomics Computational              Tucson, AZ  85721
> >   Laboratory (AGCoL)                        (520) 626-9598
> >
> > -----Original Message-----
> > From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]
> > On Behalf Of Agrin, Nathan
> > Sent: Friday, December 13, 2002 1:24 PM
> > To: bioperl-l@bioperl.org
> > Subject: [Bioperl-l] Should be simple accession number script, but
>
> it's
>
> > not.
> >
> >
> > I'm writing a simple script:
> > 1) I want it to take a list of accession numbers in a .txt file,
> > 2) pull out the entries from GenBank,
> > 3) and output the names and sequences to a new .txt file.
> >
> > The problem is that when I write something like;
> >
> > $seq = $db->get_Seq_by_acc($accession);
> > print $seq->seq();
> >
> > I get an error saying cannot preform command on an undefined variable.
> > Anyone had similar problems?  What's really wierd is that it seems to
> > work some of the time.  Below is the code so far:
> >
> > #!perl
> >
> > use Bio::DB::GenBank;
> >
> >
> > open SEQ, "<accession.txt";
> > @anum = "<SEQ>";
> >
> > my $db = new Bio::DB::GenBank;
> >
> > foreach my $accession ( @anum ) {
> > 	# just get things by accession number
> > 	my $seq = $db->get_Seq_by_acc($accession);
> > 	print $seq->seq();
> > }
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
Lincoln Stein
lstein@cshl.org