[Bioperl-l] Getting coding sequence starting with a protein record
Jason Stajich
jason at bioperl.org
Tue Apr 15 21:55:22 UTC 2014
Warren -
Can you provide a specific accession as an example, there shouldn't be any
call to the translation function the way this code is running for the
object so I am guessing the accession number you are pointing to is protein
(though Bio::DB::GenBank would complain if that were so, so I'm a little
confused how this would be happening).
Jason
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason
http://twitter.com/hyphaltip
On Tue, Apr 15, 2014 at 2:23 PM, Warren Gallin <wgallin at ualberta.ca> wrote:
> Jason,
>
> Works almost perfectly, except I am getting back the protein
> sequence rather than the underlying nucleotide sequence.
>
> My specific code fragment is:
>
>
>
> my $gb_db = Bio::DB::GenBank->new();
>
> <...Bunch of code that retrieves a protein GenBank formatted file
> and walks through the features until...>
>
> my $feature = $feature_object->primary_tag;
>
> if ( $feature ne "CDS" ) { next; }
> else {
> $spliced_cds = $feature_object->spliced_seq($gb_db);
> $na_seq = $spliced_cds->seq;
>
> }
>
> < More code, that leads to printing the value for $na_seq …>
>
> So somehow the nucleotide sequence is being translated into
> protein sequence - is there some option that needs setting to prevent
> translation?
>
> Warren
>
>
> On Apr 15, 2014, at 1:11 PM, Jason Stajich <jason at bioperl.org> wrote:
>
> > This is supported in bioperl with the feature objects and the
> Bio::SeqFeatureI method spliced_seq -
> > You would just have Bio::DB::GenBank object which you provide to the
> function;
> >
> > my $db = Bio::DB::Genbank->new();
> > my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);
> >
> >
> >
> >
> > Jason Stajich
> > jason at bioperl.org
> > http://bioperl.org/wiki/User:Jason
> > http://twitter.com/hyphaltip
> >
> >
> > On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <wgallin at ualberta.ca>
> wrote:
> > I am having a problem finding a general method of recovering the
> nucleotide coding sequence for a protein sequence record.
> >
> > Generally tracking the CDS annotation back to the nucleotide sequence
> record using the accession number of the nucleotide sequence is working.
> >
> > One problem arises when the underlying coding sequence is spliced from
> multiple nucleotide records. Is there a general approach to automatically
> track down and joint the different sequence fragments from different
> sequence entries? An example of the problem can be seen if you start from
> the protein record with GI number 7715882. It is annotated as coming from
> three different nucleotide records. Is there an approach in Bioperl that
> will detect and download these three records and splice together the
> appropriate parts to get the coding sequence?
> >
> > The other problem that I am having is the ongoing issue of protein
> records annotated as highly redundant sequences , with WP-XXXXXX accession
> numbers. Has anyone found a way to retrieve the set of different
> nucleotide sequences that all encode a single AP-annotated protein sequence?
> >
> > Any help would be appreciated,
> >
> > Warren Gallin
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
More information about the Bioperl-l
mailing list