[Bioperl-l] Getting coding sequence starting with a protein record

Tue Apr 15 21:55:22 UTC 2014

Warren -

Can you provide a specific accession as an example, there shouldn't be any
call to the translation function the way this code is running for the
object so I am guessing the accession number you are pointing to is protein
(though Bio::DB::GenBank would complain if that were so, so I'm a little
confused how this would be happening).

Jason

Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason
http://twitter.com/hyphaltip

On Tue, Apr 15, 2014 at 2:23 PM, Warren Gallin <wgallin at ualberta.ca> wrote:

> Jason,
>
>         Works almost perfectly, except I am getting back the protein
> sequence rather than the underlying nucleotide sequence.
>
>         My specific code fragment is:
>
>
>
>         my $gb_db = Bio::DB::GenBank->new();
>
>         <...Bunch of code that retrieves a protein GenBank formatted file
> and walks through the features until...>
>
>         my $feature = $feature_object->primary_tag;
>
>         if ( $feature ne "CDS" ) { next; }
>         else {
>                 $spliced_cds = $feature_object->spliced_seq($gb_db);
>                 $na_seq      = $spliced_cds->seq;
>
>         }
>
>         < More code, that leads to printing the value for $na_seq …>
>
>         So somehow the nucleotide sequence is being translated into
> protein sequence - is there some option that needs setting to prevent
> translation?
>
> Warren
>
>
> On Apr 15, 2014, at 1:11 PM, Jason Stajich <jason at bioperl.org> wrote:
>
> > This is supported in bioperl with the feature objects and the
> Bio::SeqFeatureI method spliced_seq -
> > You would just have  Bio::DB::GenBank object which you provide to the
> function;
> >
> > my $db = Bio::DB::Genbank->new();
> > my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);
> >
> >
> >
> >
> > Jason Stajich
> > jason at bioperl.org
> > http://bioperl.org/wiki/User:Jason
> > http://twitter.com/hyphaltip
> >
> >
> > On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <wgallin at ualberta.ca>
> wrote:
> > I am having a problem finding a general method of recovering the
> nucleotide coding sequence for a protein sequence record.
> >
> > Generally tracking the CDS annotation back to the nucleotide sequence
> record using the accession number of the nucleotide sequence is working.
> >
> > One problem arises when the underlying coding sequence is spliced from
> multiple nucleotide records.  Is there a general approach to automatically
> track down and joint the different sequence fragments from different
> sequence entries?  An example of the problem can be seen if you start from
> the protein record with GI number 7715882.  It is annotated as coming from
> three different nucleotide records.  Is there an approach in Bioperl that
> will detect and download these three records and splice together the
> appropriate parts to get the coding sequence?
> >
> > The other problem that I am having is the ongoing issue of protein
> records annotated as highly redundant sequences , with WP-XXXXXX accession
> numbers.  Has anyone found a way to retrieve the set of different
> nucleotide sequences that all encode a single AP-annotated protein sequence?
> >
> > Any help would be appreciated,
> >
> > Warren Gallin
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>