[Bioperl-l] get_sequence - acc does not exist
Ewan Birney
birney at ebi.ac.uk
Wed Aug 31 08:50:59 EDT 2005
Paul G Cantalupo wrote:
> Hello,
>
> I discovered that Bio::Perl get_sequence does not handle Genbank GI
> numbers properly due to the following code in get_sequence:
>
> if( $identifier =~ /^\w+\d+$/ ) {
> $seq = $db->get_Seq_by_acc($identifier);
> } else {
> $seq = $db->get_Seq_by_id($identifier);
> }
>
> Genbank GI numbers (i.e. 51527264) match the regular expression
> /^\w+\d+$/ therefore unsuprisingly the method get_Seq_by_acc fails (with
> a warning like: MSG: acc (gb|51527264) does not exist). Instead, the
> method get_Seq_by_id works when called with GI numbers:
>
>
> use Bio::DB::GenBank;
> my $genbank_db = Bio::DB::GenBank->new();
> $seq = $genbank_db->get_Seq_by_id(51527264);
> print $seq->desc;
>
> Shouldn't the regular expression in get_sequence be changed to look for
> identifiers that are all digits and then call get_Seq_by_id? Or am I not
> understanding something?
>
traditionally "GI" numbers are _not_ accession numbers: GI numbers
are internal numbers given out by NCBI for sequences in-house. However, this
is all about heuristics guessing the right thing, and probably the right thing
to do is try the get_Seq_by_acc, and then if this is undef, try get_Seq_by_id
> Thank you,
>
> Paul
>
> Paul Cantalupo
> Research Specialist/Systems Programmer
> 559 Crawford Hall
> Department of Biological Sciences
> University of Pittsburgh
> Pittsburgh, PA 15260
> Work: 412-624-4687
> Fax: 412-624-4759
>
> Ask me about Toastmasters: www.toastmasters.org
> Midday Club Treasurer
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list