[Bioperl-l] Getting coding sequence starting with a protein record

Tue Apr 15 22:26:47 UTC 2014

Jason,

Attached is a minimal script that illustrates my problem - I am expecting to get a print of an UPDATE line with a nucleotide sequence.

I must be missing some BioPerl subtlety because this is happening with every one of some hundred gi numbers that I try.

Thanks for looking at this - I am sure that I have a blind spot here somewhere.

Warren

On Apr 15, 2014, at 3:55 PM, Jason Stajich <jason at bioperl.org> wrote:

> Warren -
> 
> Can you provide a specific accession as an example, there shouldn't be any call to the translation function the way this code is running for the object so I am guessing the accession number you are pointing to is protein (though Bio::DB::GenBank would complain if that were so, so I'm a little confused how this would be happening).
> 
> Jason
> 
> Jason Stajich
> jason at bioperl.org
> http://bioperl.org/wiki/User:Jason
> http://twitter.com/hyphaltip
> 
> 
> On Tue, Apr 15, 2014 at 2:23 PM, Warren Gallin <wgallin at ualberta.ca> wrote:
> Jason,
> 
>         Works almost perfectly, except I am getting back the protein sequence rather than the underlying nucleotide sequence.
> 
>         My specific code fragment is:
> 
> 
> 
>         my $gb_db = Bio::DB::GenBank->new();
> 
>         <...Bunch of code that retrieves a protein GenBank formatted file and walks through the features until...>
> 
>         my $feature = $feature_object->primary_tag;
> 
>         if ( $feature ne "CDS" ) { next; }
>         else {
>                 $spliced_cds = $feature_object->spliced_seq($gb_db);
>                 $na_seq      = $spliced_cds->seq;
> 
>         }
> 
>         < More code, that leads to printing the value for $na_seq …>
> 
>         So somehow the nucleotide sequence is being translated into protein sequence - is there some option that needs setting to prevent translation?
> 
> Warren
> 
> 
> On Apr 15, 2014, at 1:11 PM, Jason Stajich <jason at bioperl.org> wrote:
> 
> > This is supported in bioperl with the feature objects and the Bio::SeqFeatureI method spliced_seq -
> > You would just have  Bio::DB::GenBank object which you provide to the function;
> >
> > my $db = Bio::DB::Genbank->new();
> > my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);
> >
> >
> >
> >
> > Jason Stajich
> > jason at bioperl.org
> > http://bioperl.org/wiki/User:Jason
> > http://twitter.com/hyphaltip
> >
> >
> > On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <wgallin at ualberta.ca> wrote:
> > I am having a problem finding a general method of recovering the nucleotide coding sequence for a protein sequence record.
> >
> > Generally tracking the CDS annotation back to the nucleotide sequence record using the accession number of the nucleotide sequence is working.
> >
> > One problem arises when the underlying coding sequence is spliced from multiple nucleotide records.  Is there a general approach to automatically track down and joint the different sequence fragments from different sequence entries?  An example of the problem can be seen if you start from the protein record with GI number 7715882.  It is annotated as coming from three different nucleotide records.  Is there an approach in Bioperl that will detect and download these three records and splice together the appropriate parts to get the coding sequence?
> >
> > The other problem that I am having is the ongoing issue of protein records annotated as highly redundant sequences , with WP-XXXXXX accession numbers.  Has anyone found a way to retrieve the set of different nucleotide sequences that all encode a single AP-annotated protein sequence?
> >
> > Any help would be appreciated,
> >
> > Warren Gallin
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20140415/7d343171/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test_Script.pl.pl
Type: text/x-perl-script
Size: 1264 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20140415/7d343171/attachment-0002.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20140415/7d343171/attachment-0005.html>