[Bioperl-l] Parsing individual exons from EMBL file
Jason Stajich
jason at bioperl.org
Sat Dec 18 00:32:31 UTC 2010
You need to operate on the sub-locations.
basically
for my $loc ( $feature->location->each_Location ) {
print $loc->start .. $loc->end, "\n";
}
But for converting to GFF3 will want to look at the Unflattener which
basically does this for you and the bp_unflatten_seq.pl script which
implements it. What you may know by now is that all EMBL/GenBank
records are not consistent in how things are annotated (how ID, product,
description are used) so mapping this to properly formatted GFF3 for
Gbrowse, etc can be a tedious process sometimes.
FYI -- APIDB also provides GFF3 if you would rather...
http://tritrypdb.org/common/downloads/release-2.5/Lmajor/gff/
-jason
Gowthaman Ramasamy wrote:
> Hi All,
> I am trying to find a method to parse the individual exons/cds featutres from a multi exonic gene feature. When I try the following methods, it gives me only the outer most boundaries. (55387 and 56300 in the below example).
>
> For example...my EMBL contains...
> FT CDS complement(join(55387..56181,56187..56300))
> FT /ID="apidb|cds_LmjF01.0200-1"
> FT /description="."
> FT /size="903"
> FT /Parent="apidb|rna_LmjF01.0200-1"
> FT /feature_order="115"
> FT /product="hypothetical+protein%2C+conserved"
> FT /Name="cds"
>
> Use Bio::SeqIO;
> While(my $seqobj = $file_io->next_seq()){
> My @features = $seqobj->all_SeqFeatures();
> Foreach $feat (@features){
> $feat->start;
> $feat->end;
> }
> }
>
> When I use $feat->start; it gives me 55387 and $feat>end; it gives me 56300.
>
> Ideally I would like to get the start and end of sub features (exon 1 55387..56181) and (exon256187..56300). When when I tried to use the "sub_SeqFeature()" it does not return anything.
>
> Any idea? Also not sure, if I have the rightly formated EMBL file. Any suggestions...
>
> Any suggestion of converting EMBL to GFF3 will be appreciated. I have a script which does that. But just fuses all the joins together to give me only one GFF line. Basically, I could not separate the exons.
>
> Thanks,
> Gowtham
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki
More information about the Bioperl-l
mailing list