[Bioperl-l] Getting coding sequence starting with a protein record
Jason Stajich
jason at bioperl.org
Tue Apr 15 19:11:42 UTC 2014
This is supported in bioperl with the feature objects and the
Bio::SeqFeatureI method spliced_seq -
You would just have Bio::DB::GenBank object which you provide to the
function;
my $db = Bio::DB::Genbank->new();
my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason
http://twitter.com/hyphaltip
On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <wgallin at ualberta.ca> wrote:
> I am having a problem finding a general method of recovering the
> nucleotide coding sequence for a protein sequence record.
>
> Generally tracking the CDS annotation back to the nucleotide sequence
> record using the accession number of the nucleotide sequence is working.
>
> One problem arises when the underlying coding sequence is spliced from
> multiple nucleotide records. Is there a general approach to automatically
> track down and joint the different sequence fragments from different
> sequence entries? An example of the problem can be seen if you start from
> the protein record with GI number 7715882. It is annotated as coming from
> three different nucleotide records. Is there an approach in Bioperl that
> will detect and download these three records and splice together the
> appropriate parts to get the coding sequence?
>
> The other problem that I am having is the ongoing issue of protein records
> annotated as highly redundant sequences , with WP-XXXXXX accession numbers.
> Has anyone found a way to retrieve the set of different nucleotide
> sequences that all encode a single AP-annotated protein sequence?
>
> Any help would be appreciated,
>
> Warren Gallin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list