[Bioperl-l] Proposal: SemanticMapping and call for info on Gene Objects
Jason Stajich
jason@cgt.mc.duke.edu
Sat, 11 May 2002 18:10:12 -0400 (EDT)
I'm starting to try and build the semantic mapper for building
Bio::SeqFeature::Gene objects from a list of Bio::SeqFeatureI objects.
Dave/Hilmar any chance you guys can walk us through the ideas behind the
Gene objects and the assumptions that have been made.
I am wondering if we have a rich enough set of objects for truly
representing all the information one might have for a gene.
I think we probably need a CDS object or a little richer exon object to
note where translation starts. I'm not sure what is appropriate - to
build objects towards the way data is organized in a genbank/embl file, or
build them a little more generically and have to do some acrobatics to go
in seqfile -> GeneStructure -> out seqfile format.
Anyone who has opinions or ideas here, I would encourage you to look over
the existing objects and help propose some directions. I'd perhaps like
to adopt what we can from the Gadfly & Ensembl models as well - any
guidance and lessons learned would be great Ewan/Michele/Chris M.
As for the actually semantic mapping part - here is a simple interface
I've started.
Bio::SeqFeature::SemanticMapperI
(or should it be a Bio::Factory::SemanticMapperI ???)
(happy to hear better suggestions for names)
=head2 map_from_generic_features
Title : map_from_generic_features
Usage : my @features = $mapper->map_from_geneic_features(-features => \@generic);
Function: Will build new Bio::SeqFeatureI object(s) from set of
Bio::SeqFeatureI objects on implemented logic.
Returns : List of Bio::SeqFeatureI objects
Args : -features => \@generic # Feature list
=head2 map_to_generic_features
Title : map_to_generic_features
Usage : my @features = $mapper->map_to_generic_features(-features => \@specialized);
Function: Will build generic Bio::SeqFeature::Generic objects from
specialized Bio::SeqFeature:: objects useful for outputting
GenBank/EMBL Feature Tables.
Returns : List of Bio::SeqFeatureI
Args : -features => \@specialized # array ref of features to map to
generic objects
=cut
The first implementing class would be Bio::SeqFeature::GeneSemanticMapper,
which would work to build Bio::SeqFeature::Gene::GeneStructure objects or
at least Exon/Intron objects depending on the depth of the annotated
data.
A second implementing class would be
Bio::SeqFeature::AnalysisSemanticMapper. (name up for debate!) This would
allow us to expand/collapse SeqFeature::Computational/FeaturePair/HSP etc
objects to/from a set of SeqFeatureI(s).
This class would also provide a means for simplifying object from high
level bioperl SeqFeature classes down to the Generic object level suitable
for outputting.
I would then propose adding methods to Bio::SeqIO - add_SemanticMapper(),
each_SemanticMapper, remove_SemanticMappers() to deal with having a set of
semantic mappers to process sequence features once they have been created.
Perhaps add a boolean state to the SeqIO class as to whether or not to use
SemanticMapping as there is going to be a serious performance cost. One
can always process features after the sequence is read in so we gain
flexibility without always paying the performance cost. By delegating
this to a separate factory we can still reimplement the sequence parsing
later on without affecting this behavior.
Comments, ideas, & volunteers welcomed.
-jason
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu