[Bioperl-l] Getting coding sequence starting with a protein record

Tue Apr 15 19:11:42 UTC 2014

This is supported in bioperl with the feature objects and the
Bio::SeqFeatureI method spliced_seq -
You would just have  Bio::DB::GenBank object which you provide to the
function;

my $db = Bio::DB::Genbank->new();
my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);

Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason
http://twitter.com/hyphaltip

On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <wgallin at ualberta.ca> wrote:

> I am having a problem finding a general method of recovering the
> nucleotide coding sequence for a protein sequence record.
>
> Generally tracking the CDS annotation back to the nucleotide sequence
> record using the accession number of the nucleotide sequence is working.
>
> One problem arises when the underlying coding sequence is spliced from
> multiple nucleotide records.  Is there a general approach to automatically
> track down and joint the different sequence fragments from different
> sequence entries?  An example of the problem can be seen if you start from
> the protein record with GI number 7715882.  It is annotated as coming from
> three different nucleotide records.  Is there an approach in Bioperl that
> will detect and download these three records and splice together the
> appropriate parts to get the coding sequence?
>
> The other problem that I am having is the ongoing issue of protein records
> annotated as highly redundant sequences , with WP-XXXXXX accession numbers.
>  Has anyone found a way to retrieve the set of different nucleotide
> sequences that all encode a single AP-annotated protein sequence?
>
> Any help would be appreciated,
>
> Warren Gallin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>