[Bioperl-l] Get spliced sequences from a DB::Seqfeature::Store database

Tue Jul 2 22:12:32 UTC 2013

I believe this was discussed on-list at one point; the problem IIRC with spliced_seq() is that the current API doesn't indicate how to splice the sequence together if there are sub-features with different types present (e.g. exons, UTR, CDS, etc).  It could be implemented but probably not as spliced_seq() as the API doesn't expect any arguments for this purpose.

chris

On Jun 30, 2013, at 6:01 PM, Darwin Sorento Dichmann <dichmann at berkeley.edu> wrote:

> Greetings, 
> 
> I wish to extract the sequences of all mRNAs in a DB::Seqfeature::Store database, but I get the entire genomic region covered by a given transcript rather than the spliced sequence. I tried using the method spliced_seq but it is not supported, and selecting -type=>'CDS' yields the individual exons rather the full transcript.
> 
> I assume that I am missing something obvious and any pointers to how to solve this is greatly appreciated. Eventually I would like to get the CDS and translated AA sequence of these sequences and comments on how to most elegantly get that would also be very helpful.
> 
> The gff3 files describing the features should be OK, since they have been used in a Gbrowse database that draws the transcripts correctly.
> 
> Best wishes,
> Darwin
> 
> Code:
> 
> #!/usr/bin/perl
> # =================================================================
> # = extract trancript sequences from Bio::DB::Seqfeature database =
> # ================================================================= 
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::SeqFeatureI;
> use Bio::DB::SeqFeature::Store;
> 
> my $db = Bio::DB::SeqFeature::Store->
> 	new(-adaptor	=> 'DBI::mysql',
> 		-dsn		=> 'DB_NAME',
> 		-user		=> 'USER',
> 		-pass		=> 'PASSWORD',
> 		);
> 
> my $seq_stream = $db->get_seq_stream(
> 	-type=>'mRNA',
> 	); # Get all mRNAs in the genome.
> 
> 
> while (my $seq = $seq_stream->next_seq) {
> 				my $name = $seq->name;
> 				print "This is the name: $name\n";
> 				my $sequence = $seq->dna;
> 				print "Sequence: $sequence\n"; # This prints the entire genomic region covered by the transcript.
> 				}
> exit;
> 
> 
> -----------------------------------------------
> Darwin Sorento Dichmann, M.S., PhD
> Associate Specialist
> Harland Lab
> University of California, Berkeley
> Molecular and Cell Biology
> 571 Life Sciences Addition
> Berkeley, CA 94720
> Phone# (510) 643-7830
> E-mail: dichmann at berkeley.edu
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l