[Bioperl-l] est2genome

Jason Stajich jason@cgt.mc.duke.edu
Fri, 11 Oct 2002 10:28:04 -0400 (EDT)

I wrote a very basic est2genome parser in Bio::Tools::Est2Genome and a
test in t/est2genome.

Now, I didn't really do this the way I'd like as I'm returning an array
of either Bio::SeqFeature::SimilarityPair (exons) or Bio::SeqFeature::Generic (introns)
and next_feature isn't supported yet because I don't think the current
gene objects fit properly with this data.

The do not allow attatchment of evidence or the fact that the exon might
contain a pair of information for the genomic and cdna/pep information.

Additionally, we don't really seem to do a good job of serializing (GFF,
GAME, GenBank/EMBL/Swissprot) Bio::SeqFeature objects which aren't

I think we need to add the hooks to make this simplier so one can, for
example, parse with Est2Genome and output as annotation in GFF or
GenBank/EMBL formats.  We can use tag/value pairs to output the
score,alignment information in either of these formats, and allow the user
to override this if they have a specialized way they want to output this.

The problem comes in the composite objects (FeaturePair, SimilarityPair) -
these can't be properly written out because one never sees the
feature2()/hit() component of the data, nor the extra fields like
significance when being written out by genbank/embl or gff writers.  So we
need a better way to register what are the available outputs are in a sort
of recursive fashion which can be available as tag/values and may have
non-unique tag names.

Does anyone have good ideas of how to structure this? Some sort of 'get
all the tag values and all of your children's tag/values pairs and any
registered data functions'.

Also, in a final note, Ensembl is starting to standardize their function
names from each_XX to get_all_XX - I think we have this implicit each_XX
returns a list, while, next_XX is an iterator method.  I don't think this
impacts us too much, but we should try and insure we are being consistent
across the board so people aren't getting mislead.


Jason Stajich
Duke University
jason at cgt.mc.duke.edu