[Bioperl-l] Get nucleotide sequence when expecting proteinfromgenpept

Chris Fields cjfields at uiuc.edu
Tue Jul 11 22:47:38 UTC 2006


Okay, now try this:

use Bio::DB::GenPept;
use Bio::SeqIO;

my $factory = Bio::DB::GenPept->new(-format => 'fasta');
my $seqin = $factory->get_Stream_by_acc('T16005');
my $seqout = Bio::SeqIO->new(-fh => \*STDOUT,
                             -format => 'fasta');
while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

This returns both the nucleotide sequence and the correct protein sequence;
the protein was returned second for some reason, so get_Seq_by_acc misses it
while get_Stream_by_acc doesn't.  I have notified NCBI about this issue, but
they will likely just tell me to use the GI number for searches as they are
unique.  Probably a good warning for anyone using accessions for all their
work (I use the GI myself).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, July 11, 2006 5:05 PM
> To: 'Frederick Partridge'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Get nucleotide sequence when expecting
> proteinfromgenpept
> 
> It's an imprted PIR record, so there probably is no accession recorded in
> the database.  I think NCBI uses a fallback to nucleotide if it can't find
> a
> particular accession via protein.  Using the primary ID (the GI#, 7498730)
> works.
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Frederick Partridge
> > Sent: Tuesday, July 11, 2006 4:23 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Get nucleotide sequence when expecting protein
> > fromgenpept
> >
> >
> >
> > I am trying to retrieve various protein sequences from genpept using
> > get_Seq_by_acc. All of them work ok, except one T16005:
> >
> >
> > If I try and retrieve it with a reduced program:
> >
> >
> > #!usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Perl;
> > use Bio::SeqIO;
> >
> > my $genpept = new Bio::DB::GenPept;
> >
> > my $seq = $genpept->get_Seq_by_acc('T16005');
> >
> > print ($seq->seq(),'\n');
> >
> >
> >
> > I get back a nucleotide sequence, which is another sequence at NCBI with
> > the same accession number. (I thought these were meant to be unique? but
> > evidently not.)
> >
> >
> > I am using bioperl 1.5.1, perl 5.8.1, Mac OS 10.3
> >
> >
> > Could anyone help me to get this protein sequence with my program?
> >
> >
> > Many thanks,
> >
> >
> >
> > Freddie Partridge
> >
> > University of Oxford
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list