[Bioperl-l] Parsing individual exons from EMBL file

Gowthaman Ramasamy gowthaman.ramasamy at seattlebiomed.org
Sat Dec 18 00:18:40 UTC 2010


Hi All,
I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example).

For example...my EMBL contains...
FT   CDS             complement(join(55387..56181,56187..56300))
FT                   /ID="apidb|cds_LmjF01.0200-1"
FT                   /description="."
FT                   /size="903"
FT                   /Parent="apidb|rna_LmjF01.0200-1"
FT                   /feature_order="115"
FT                   /product="hypothetical+protein%2C+conserved"
FT                   /Name="cds"

Use Bio::SeqIO;
While(my $seqobj = $file_io->next_seq()){
    My @features = $seqobj->all_SeqFeatures();
    Foreach $feat (@features){
        $feat->start;
        $feat->end;
    }
}

When I use $feat->start; it gives me 55387 and   $feat>end; it gives me 56300.

 Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300).  When when I tried to use the "sub_SeqFeature()" it does not return anything.

Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions...

Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons.

Thanks,
Gowtham




More information about the Bioperl-l mailing list