[Bioperl-l] Pulling exons out of a Genbank mRNA
Brian Osborne
osborne1 at optonline.net
Tue Feb 14 01:38:30 UTC 2006
Amir,
The idea is to look at the sub-locations in the SplitLocation object, this
is discussed in FAQ 5.2:
http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st
atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F
The sequence of the feature itself can be obtained by using the entire_seq()
method:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences
Brian O.
On 2/13/06 3:57 PM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:
> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
> mRNA join(complement(22257..22386),complement(22067..22186),
> complement(16753..17101),complement(13840..13962),
> complement(10649..10820),complement(502..3028))
> /gene="ENSG00000005812"
> /note="transcript_id=ENST00000355619"
> exon complement(13840..13962)
> /note="exon_id=ENSE00000802462"
>
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
>
> my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
> my @features = $seq->get_SeqFeatures(); # just top level
> foreach my $feat ( @features ) {
> my $type = $feat->primary_tag;
> if ($type eq "mRNA") {
> print "Feature ",$feat->primary_tag,
> " starts ",$feat->start," ends ", $feat->end,
> " strand ",$feat->strand,"\n";
> my @feats = $feat->get_SeqFeatures();
> print "Found ", scalar @feats, " sub-features\n";
> } elsif ($type eq "exon") {
> print "Feature ",$feat->primary_tag,
> " starts ",$feat->start," ends ", $feat->end,
> " strand ",$feat->strand,"\n";
> }
> }
> }
>
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
>
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list