[Bioperl-l] Proposal: SemanticMapping and call for info on Gene Objects

Jason Stajich jason@cgt.mc.duke.edu
Sat, 11 May 2002 18:10:12 -0400 (EDT)


I'm starting to try and build the semantic mapper for building
Bio::SeqFeature::Gene objects from a list of Bio::SeqFeatureI objects.
Dave/Hilmar any chance you guys can walk us through the ideas behind the
Gene objects and the assumptions that have been made.
I am wondering if we have a rich enough set of objects for truly
representing all the information one might have for a gene.

I think we probably need a CDS object or a little richer exon object to
note where translation starts.  I'm not sure what is appropriate - to
build objects towards the way data is organized in a genbank/embl file, or
build them a little more generically and have to do some acrobatics to go
in seqfile -> GeneStructure -> out seqfile format.

Anyone who has opinions or ideas here, I would encourage you to look over
the existing objects and help propose some directions.  I'd perhaps like
to adopt what we can from the Gadfly & Ensembl models as well - any
guidance and lessons learned would be great Ewan/Michele/Chris M.

As for the actually semantic mapping part - here is a simple interface
I've started.

Bio::SeqFeature::SemanticMapperI
(or should it be a Bio::Factory::SemanticMapperI ???)
(happy to hear better suggestions for names)

=head2 map_from_generic_features

 Title   : map_from_generic_features
 Usage   : my @features = $mapper->map_from_geneic_features(-features => \@generic);
 Function: Will build new Bio::SeqFeatureI object(s) from set of
           Bio::SeqFeatureI objects on implemented logic.
 Returns : List of Bio::SeqFeatureI objects
 Args    : -features => \@generic  # Feature list

=head2 map_to_generic_features

 Title   : map_to_generic_features
 Usage   : my @features = $mapper->map_to_generic_features(-features => \@specialized);
 Function: Will build generic Bio::SeqFeature::Generic objects from
           specialized Bio::SeqFeature:: objects useful for outputting
           GenBank/EMBL Feature Tables.
 Returns : List of Bio::SeqFeatureI
 Args    : -features => \@specialized # array ref of features to map to
generic objects

=cut

The first implementing class would be Bio::SeqFeature::GeneSemanticMapper,
which would work to build Bio::SeqFeature::Gene::GeneStructure objects or
at least Exon/Intron objects depending on the depth of the annotated
data.

A second implementing class would be
Bio::SeqFeature::AnalysisSemanticMapper. (name up for debate!) This would
allow us to expand/collapse SeqFeature::Computational/FeaturePair/HSP etc
objects to/from a set of SeqFeatureI(s).

This class would also provide a means for simplifying object from high
level bioperl SeqFeature classes down to the Generic object level suitable
for outputting.

I would then propose adding methods to Bio::SeqIO - add_SemanticMapper(),
each_SemanticMapper, remove_SemanticMappers() to deal with having a set of
semantic mappers to process sequence features once they have been created.
Perhaps add a boolean state to the SeqIO class as to whether or not to use
SemanticMapping as there is going to be a serious performance cost.  One
can always process features after the sequence is read in so we gain
flexibility without always paying the performance cost.  By delegating
this to a separate factory we can still reimplement the sequence parsing
later on without affecting this behavior.


Comments, ideas, & volunteers welcomed.

-jason
-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu