[Bioperl-l] Parsing CDS info from GFF file

Gowthaman Ramasamy gowthaman.ramasamy at sbri.org
Fri Mar 2 21:08:27 UTC 2007



Hi List,
I am trying to find  a way to grab cordinates CDS (startcodon-stopcodon) from a GFF file.
But, the GFF file has cordinates of individual exons (cds). 
Just wondering if there is any tool/module/script available for this.
It should take care of both multi-exonic genes and + or - strand as well.

set of examples of GFF file entries are bellow...


many thanks in advance
gowtham
SBRI, Seattle.


1400    TIGR    gene    127456  128386  .       +       .       ID=1400.t00213;Name=hypothetical protein
1400    TIGR    mRNA    127456  128386  .       +       .       ID=1400.m02493;Parent=1400.t00213
1400    TIGR    five_prime_utr  127456  127993  .       +       .       ID=utr5p_of_1400.m02493;Parent=1400.m02493
1400    TIGR    exon    127456  128386  .       +       .       ID=1400.e05831;Parent=1400.m02493
1400    TIGR    CDS     127994  128314  .       +       0       ID=cds_of_1400.m02493;Parent=1400.m02493
1400    TIGR    three_prime_utr 128315  128386  .       +       .       ID=utr3p_of_1400.m02493;Parent=1400.m02493





1400    TIGR    gene    232655  233965  .       -       .       ID=1400.t00271;Name=pleckstrin homology domain protein, puta
tive
1400    TIGR    mRNA    232655  233965  .       -       .       ID=1400.m02876;Parent=1400.t00271
1400    TIGR    five_prime_utr  233477  233965  .       -       .       ID=utr5p_of_1400.m02876;Parent=1400.m02876
1400    TIGR    exon    233339  233965  .       -       .       ID=1400.e05827;Parent=1400.m02876
1400    TIGR    CDS     233339  233476  .       -       0       ID=cds_of_1400.m02876;Parent=1400.m02876
1400    TIGR    exon    233011  233182  .       -       .       ID=1400.e05826;Parent=1400.m02876
1400    TIGR    CDS     233011  233182  .       -       0       ID=cds_of_1400.m02876;Parent=1400.m02876
1400    TIGR    exon    232655  232781  .       -       .       ID=1400.e05825;Parent=1400.m02876
1400    TIGR    CDS     232729  232781  .       -       1       ID=cds_of_1400.m02876;Parent=1400.m02876
1400    TIGR    three_prime_utr 232655  232728  .       -       .       ID=utr3p_of_1400.m02876;Parent=1400.m02876






More information about the Bioperl-l mailing list