[Bioperl-l] bioperl + GFF3 audit
Jason Stajich
jason at bioperl.org
Wed Sep 19 00:04:05 UTC 2007
Something to throw out there for discussion with GFF3 gurus. Maybe
we can have a little STATE-OF-GFF3 and compliance at the GMOD
workshop after Genome Informatics in Nov?
I propose after we get the next stable release out we consider doing
a systematic code audit to insure that we can really generate proper
GFF3 compliant data from all of our parsers. This would include both
good ID/Parent as well as . I'd be happy to also think about making
sure we can generate proper GTF/GFF2.5 - whether this means we have a
translator that works on these objects or we have to code this into
the parser software that creating the sequence features, not sure.
The whole Bio::Tools mishmash is a little unsettling when trying to
generate standardized output. I'm not really clear if Bio::FeatureIO
actually tries to do this properly, but 'gene_id'/'transcript_id' for
GTF and ID/Parent 3-level Features for gene->transcript->exon/CDS
doesn't really come out properly and I end up writing workarounds on
the downstream data.
One aspect that is biting is the flat versus multi-level features
(genes -> transcripts -> exons) and how we handle them. I think this
ought to get fleshed out better so we can really support . A lot of
the Bio::Tools parsers are generally pretty laissez fair here about
things and we have a variety of non-standard and non-compliant aspects.
For example, I am playing with tRNA parsing and I assume that proper
GFF3 here is three levels of :
gene -> tRNA -> exon
with those being the primary_tag names that correspond to the
Sequence Ontology.
I have modified the code locally to report generic features but which
have sub-features that must be extracted. In addition the ID/Parent
fields are explicitly filled in and I wonder if we want to do a
better job insuring these are meaningfully entered?
So if there are interested people out there we can try and hammer out
a todo list on the wiki and see if we're generating proper GFF3 in
the first place and trying to make sure all the features that get fed
out to Bio::FeatureIO or Bio::Tools::GFF can get properly transformed
into GFF3 and GTF output.
Comments/Volunteers?
-jason
--
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list