[Bioperl-l] Should be simple accession number script, but it's not.

Jamie Hatfield jamie@genome.arizona.edu
Fri, 13 Dec 2002 14:48:51 -0700


1.0.2 code.  It's fairly consistent (like 9/10 hits work).  I don't use
it for anything production.  And you're right, users would be better
incrementally download what they need.  We've done that for awhile.

-----Original Message-----
From: Jason Stajich [mailto:jason@cgt.mc.duke.edu] 
Sent: Friday, December 13, 2002 1:59 PM
To: Jamie Hatfield
Cc: bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] Should be simple accession number script, but
it's not.


The hit or miss with many requests is what is to be expected from the
ncbi server at this point - I'm not sure if you're using the latest
code in CVS or the 1.0.2 code which uses a different web interface.

The new Bio::DB::GenBank uses a different interface and attempts to wait
between requests as that is the 'nice' thing to do according to NCBI's
remote user protocol - All in all we can't do much better than what the
webservice provides other than try to cache and retry things.

NCBI expects that people who are doing lots of requests for accession
numbers will find better success in downloading the sequence database
files locally.  You can do this with our modules and index them with
Bio::Index::Fasta or Bio::Index::GenBank depending on the format.

-jason


On Fri, 13 Dec 2002, Jamie Hatfield wrote:

> Is '@anum = "<SEQ>";' a legal construct?  I've personally never seen
> that before.  You can do '@anum = <SEQ>;', though, and it will work.
> Or, just use the <SEQ> down in your loop,
>
> e.g.,
> foreach my $accession ( <SEQ> ) {
>
> Should work also.  (or 'while (my $accession = <SEQ>)', etc.
TMTOWTDI,
> IMHO, YMMV)  :-)
>
> Sorry.  It's Friday.  Blame it on that.
>
> Ok, but in testing his script, I came up with the following:  I print
> out the requested accession number, then print out the id returned
from
> the DB lookup IF it was successful.
> 	my $seq = $db->get_Seq_by_acc($accession);
> 	print "$accession : ";
> 	print $seq->id() if ($seq);
> 	print "\n";
>
> Funny thing is, two separate invocations, I get different
> successes/failures!!
>
> Accession.txt
> =============================
> AP005198
> AP005244
> AP005246
> AP005247
> AP005252
> AP005258
> AP005292
> AP005296
> AP005301
> =============================
>
> Testprog.pl
> =============================
> #!/usr/local/bin/perl -w
>
> use Bio::DB::GenBank;
>
>
> open SEQ, "<accession.txt";
> @anum = <SEQ>;
>
> my $db = new Bio::DB::GenBank;
>
> foreach my $accession ( @anum ) {
> 	# just get things by accession number
> 	chomp($accession);
> 	my $seq = $db->get_Seq_by_acc($accession);
> 	print "$accession : ";
> 	print $seq->id() if ($seq);
> 	print "\n";
> 	#print $seq->seq();
> }
> =============================
>
> Output 1
> =============================
> AP005198 :
> AP005244 : AP005244
> AP005246 :
> AP005247 : AP005247
> AP005252 : AP005252
> AP005258 : AP005258
> AP005292 : AP005292
> AP005296 :
> AP005301 :
> =============================
>
> Output 2
> =============================
> AP005198 : AP005198
> AP005244 : AP005244
> AP005246 : AP005246
> AP005247 :
> AP005252 : AP005252
> AP005258 : AP005258
> AP005292 : AP005292
> AP005296 : AP005296
> AP005301 :
> =============================
>
> What's with that??
>
>
> ----------------------------------------------------------------------
> Jamie Hatfield                              Room 541H, Marley Building
> Systems Programmer                          University of Arizona
> Arizona Genomics Computational              Tucson, AZ  85721
>   Laboratory (AGCoL)                        (520) 626-9598
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]
> On Behalf Of Agrin, Nathan
> Sent: Friday, December 13, 2002 1:24 PM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] Should be simple accession number script, but
it's
> not.
>
>
> I'm writing a simple script:
> 1) I want it to take a list of accession numbers in a .txt file,
> 2) pull out the entries from GenBank,
> 3) and output the names and sequences to a new .txt file.
>
> The problem is that when I write something like;
>
> $seq = $db->get_Seq_by_acc($accession);
> print $seq->seq();
>
> I get an error saying cannot preform command on an undefined variable.
> Anyone had similar problems?  What's really wierd is that it seems to
> work some of the time.  Below is the code so far:
>
> #!perl
>
> use Bio::DB::GenBank;
>
>
> open SEQ, "<accession.txt";
> @anum = "<SEQ>";
>
> my $db = new Bio::DB::GenBank;
>
> foreach my $accession ( @anum ) {
> 	# just get things by accession number
> 	my $seq = $db->get_Seq_by_acc($accession);
> 	print $seq->seq();
> }
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu