[Bioperl-l] Parsing individual exons from EMBL file
Gowthaman Ramasamy
gowthaman.ramasamy at seattlebiomed.org
Sat Dec 18 00:18:40 UTC 2010
Hi All,
I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example).
For example...my EMBL contains...
FT CDS complement(join(55387..56181,56187..56300))
FT /ID="apidb|cds_LmjF01.0200-1"
FT /description="."
FT /size="903"
FT /Parent="apidb|rna_LmjF01.0200-1"
FT /feature_order="115"
FT /product="hypothetical+protein%2C+conserved"
FT /Name="cds"
Use Bio::SeqIO;
While(my $seqobj = $file_io->next_seq()){
My @features = $seqobj->all_SeqFeatures();
Foreach $feat (@features){
$feat->start;
$feat->end;
}
}
When I use $feat->start; it gives me 55387 and $feat>end; it gives me 56300.
Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300). When when I tried to use the "sub_SeqFeature()" it does not return anything.
Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions...
Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons.
Thanks,
Gowtham
More information about the Bioperl-l
mailing list