[Bioperl-l] Proposal: SemanticMapping and call for info on Gene Objects

Lincoln Stein lstein@cshl.org
Tue, 14 May 2002 12:54:59 -0400


I love the idea but am not happy about the "SemanticMapper" name.  In
the Bio::DB::GFF module these things are called Aggregators, and it
would be very nice if we could maintain consistency between the
names.  How about FeatureAggregator?  This reflects that the job of
the class is to aggregate individual Bio::SeqFeatureI objects into
larger objects.  Or maybe FeatureBuilder would be better...

Lincoln

Jason Stajich writes:
 > I'm starting to try and build the semantic mapper for building
 > Bio::SeqFeature::Gene objects from a list of Bio::SeqFeatureI objects.
 > Dave/Hilmar any chance you guys can walk us through the ideas behind the
 > Gene objects and the assumptions that have been made.
 > I am wondering if we have a rich enough set of objects for truly
 > representing all the information one might have for a gene.
 > 
 > I think we probably need a CDS object or a little richer exon object to
 > note where translation starts.  I'm not sure what is appropriate - to
 > build objects towards the way data is organized in a genbank/embl file, or
 > build them a little more generically and have to do some acrobatics to go
 > in seqfile -> GeneStructure -> out seqfile format.
 > 
 > Anyone who has opinions or ideas here, I would encourage you to look over
 > the existing objects and help propose some directions.  I'd perhaps like
 > to adopt what we can from the Gadfly & Ensembl models as well - any
 > guidance and lessons learned would be great Ewan/Michele/Chris M.
 > 
 > As for the actually semantic mapping part - here is a simple interface
 > I've started.
 > 
 > Bio::SeqFeature::SemanticMapperI
 > (or should it be a Bio::Factory::SemanticMapperI ???)
 > (happy to hear better suggestions for names)
 > 
 > =head2 map_from_generic_features
 > 
 >  Title   : map_from_generic_features
 >  Usage   : my @features = $mapper->map_from_geneic_features(-features => \@generic);
 >  Function: Will build new Bio::SeqFeatureI object(s) from set of
 >            Bio::SeqFeatureI objects on implemented logic.
 >  Returns : List of Bio::SeqFeatureI objects
 >  Args    : -features => \@generic  # Feature list
 > 
 > =head2 map_to_generic_features
 > 
 >  Title   : map_to_generic_features
 >  Usage   : my @features = $mapper->map_to_generic_features(-features => \@specialized);
 >  Function: Will build generic Bio::SeqFeature::Generic objects from
 >            specialized Bio::SeqFeature:: objects useful for outputting
 >            GenBank/EMBL Feature Tables.
 >  Returns : List of Bio::SeqFeatureI
 >  Args    : -features => \@specialized # array ref of features to map to
 > generic objects
 > 
 > =cut
 > 
 > The first implementing class would be Bio::SeqFeature::GeneSemanticMapper,
 > which would work to build Bio::SeqFeature::Gene::GeneStructure objects or
 > at least Exon/Intron objects depending on the depth of the annotated
 > data.
 > 
 > A second implementing class would be
 > Bio::SeqFeature::AnalysisSemanticMapper. (name up for debate!) This would
 > allow us to expand/collapse SeqFeature::Computational/FeaturePair/HSP etc
 > objects to/from a set of SeqFeatureI(s).
 > 
 > This class would also provide a means for simplifying object from high
 > level bioperl SeqFeature classes down to the Generic object level suitable
 > for outputting.
 > 
 > I would then propose adding methods to Bio::SeqIO - add_SemanticMapper(),
 > each_SemanticMapper, remove_SemanticMappers() to deal with having a set of
 > semantic mappers to process sequence features once they have been created.
 > Perhaps add a boolean state to the SeqIO class as to whether or not to use
 > SemanticMapping as there is going to be a serious performance cost.  One
 > can always process features after the sequence is read in so we gain
 > flexibility without always paying the performance cost.  By delegating
 > this to a separate factory we can still reimplement the sequence parsing
 > later on without affecting this behavior.
 > 
 > 
 > Comments, ideas, & volunteers welcomed.
 > 
 > -jason
 > -- 
 > Jason Stajich
 > Duke University
 > jason at cgt.mc.duke.edu
 > 
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l@bioperl.org
 > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
Positions available at my lab: see http://stein.cshl.org/#hire
========================================================================