[Bioperl-l] Pulling exons out of a Genbank mRNA

Hilmar Lapp hlapp at gmx.net
Mon Feb 13 23:58:46 UTC 2006


Why you want subfeatures? This is genbank format you're parsing,
right? Your mRNA features will have a split location. Loop over
$feat->location->each_Location() and get $seq->subseq() with the start
and end of each sublocation. If you don't know how to do this check
out the implementation of $feature->splice_seq().

This should be in the HOWTO. Is it not?

    -hilmar


On 2/13/06, Amir Karger <akarger at cgr.harvard.edu> wrote:
> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
>
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
>
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
>
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------




More information about the Bioperl-l mailing list