[Bioperl-l] est2genome
Jason Stajich
jason@cgt.mc.duke.edu
Fri, 11 Oct 2002 10:28:04 -0400 (EDT)
I wrote a very basic est2genome parser in Bio::Tools::Est2Genome and a
test in t/est2genome.
Now, I didn't really do this the way I'd like as I'm returning an array
of either Bio::SeqFeature::SimilarityPair (exons) or Bio::SeqFeature::Generic (introns)
and next_feature isn't supported yet because I don't think the current
gene objects fit properly with this data.
The do not allow attatchment of evidence or the fact that the exon might
contain a pair of information for the genomic and cdna/pep information.
Additionally, we don't really seem to do a good job of serializing (GFF,
GAME, GenBank/EMBL/Swissprot) Bio::SeqFeature objects which aren't
Bio::SeqFeature::Generic.
I think we need to add the hooks to make this simplier so one can, for
example, parse with Est2Genome and output as annotation in GFF or
GenBank/EMBL formats. We can use tag/value pairs to output the
score,alignment information in either of these formats, and allow the user
to override this if they have a specialized way they want to output this.
The problem comes in the composite objects (FeaturePair, SimilarityPair) -
these can't be properly written out because one never sees the
feature2()/hit() component of the data, nor the extra fields like
significance when being written out by genbank/embl or gff writers. So we
need a better way to register what are the available outputs are in a sort
of recursive fashion which can be available as tag/values and may have
non-unique tag names.
Does anyone have good ideas of how to structure this? Some sort of 'get
all the tag values and all of your children's tag/values pairs and any
registered data functions'.
Also, in a final note, Ensembl is starting to standardize their function
names from each_XX to get_all_XX - I think we have this implicit each_XX
returns a list, while, next_XX is an iterator method. I don't think this
impacts us too much, but we should try and insure we are being consistent
across the board so people aren't getting mislead.
-jason
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu