[Biopython] gff3 problem
Brad Chapman
chapmanb at 50mail.com
Tue Apr 5 13:22:47 UTC 2011
Michal;
> I have found http://www.biopython.org/wiki/GFF_Parsing for
> BioPython in order to read GFF3 files.
Thanks for trying out the GFF parser and for the feedback.
> How can I access exon and cds information from gff3 file?
These are stored as sub_features of the features on each record.
The GFF parser does the work of nesting exons and CDSs within their
parent features, using the parent/child relationships in GFF3.
> Why does start position is always one less than in the gff3 file,
> but the end position is the same?
As Peter mentioned, we convert to standard python 0-based
coordinates; this helps maintain consistency throughout your
code.
> Why do not I get Note=Elongation factor P (EF-P)...?
These are stored in the qualifiers attribute of each feature.
To demonstrate, if we modify your code slightly:
in_handle = open(in_file)
for rec in GFF.parse(in_handle):
for feature in rec.features:
print feature.type, feature.location
print feature.qualifiers
for sub_feature in feature.sub_features:
print " ", sub_feature.type, sub_feature.location
in_handle.close()
This will print out details of each feature. For instance, here is
a gene with exon sub_features:
gene [2234:3344]
{'Note': ['Elongation factor P (EF-P) family protein n:2 Tax:Arabidopsis RepID:D7L774_ARALY'],
'source': ['x'], 'ID': ['BC-x.1'], 'Name': ['BC-x.1']}
exon [2234:2279]
exon [2422:2535]
exon [2609:2691]
exon [2762:2864]
exon [2971:3049]
exon [3125:3251]
exon [3320:3344]
Hope this helps,
Brad
More information about the Biopython
mailing list