[Bioperl-l] Proposal: SemanticMapping and call for info on GeneObjects
Hilmar Lapp
hlapp@gnf.org
Mon, 13 May 2002 13:56:14 -0700
> -----Original Message-----
> From: Chris Mungall [mailto:cjm@fruitfly.org]
> Sent: Monday, May 13, 2002 8:20 AM
> To: Jason Stajich
> Cc: dblock@gnf.org; Hilmar Lapp; Bioperl
> Subject: Re: [Bioperl-l] Proposal: SemanticMapping and call
> for info on
> GeneObjects
>
>
>
> Sounds sensible; you know my opinions on biospecific classes
> but if folks
> want them this seems a good way to do it.
>
> I would venture that map_to_generic_features() isn't really
> necessary, as
> I strongly feel that the Gene/Transcript/Exon/etc classes should be
> lightweight wrappers on top of the generic seqfeatures, with class
> specific attribute accessors mapped onto the seqfeature
> tag/value system.
>
> eg in gadfly, calling $gene->transcript_list([@trs]) actually maps to
>
> $seqfeature->set_subfeatures_by_type("transcript", [@trs])
>
> this keeps everything working for applications that just want
> to use the
> objects at the generic seqfeature level.
Right. The downside of this approach is that it somewhat defeats the purpose of exposed vs private (hence protected) methods and logic. I.e., everyone can completely mess up your intricate feature encoding using perfectly 'legal' methods.
I'd rather vote for having specific contracts in the base classes or interfaces, and derived classes are required to override these if they need to in order to get their state taken care of properly. More coding work, but safer in the end. (On the other hand, this may end up duplicating large amounts of code for properly producing, say, GFF format. But then, I would think you should be able to minimize that by proper modularization - theoretically at least :) In fact, I've taken the fast path myself for some of the SeqFeature classes -- some of the methods just map to SeqFeature::Generic methods :)
>
> Not sure about recording translation starts to the exons - what about
> doubly encoded genes in retroviral genomes? also, my faves -
> dicistronic
> genes.
>
> I'm happy to provide a tricky test-set of genbank files to test this.
>
> This part is a bit less fleshed out... but it would be really
> nice if the
> biology encoded in the object model is both as flexible as
> possible, and
> open to introspection.
>
> E.g. let's take a small part of SO and turn it into a lispy perl
> datastructure:
>
> [schema=>
> [gene=>[[isa=>"seqfeature"],
> [coding=>1],
> [class=>"Bio::GeneI"]]],
> [noncoding-gene=>[[isa=>"gene"],
> [coding=>0],
> [class=>"Bio::NcGeneI"]]],
> [transcript=>[[isa=>"seqfeature"],
> [partof=>"gene"],
> [class=>"Bio::TranscriptI"]]],
> ]
>
> Ewan will hate this... but it would be nice to have as much of the
> implementation specified dynamically by a "language" such as
> the above. Or
> at least have it as an implementation option. If not, at
> least let's try
> and keep the OM to SO mapping clean.
>
> Here's SO:
> ftp://ftp.geneontology.org/pub/go/gobo/sequence.ontology/
>
I have to admit my take on this over time has moved towards Ewan's: at the end of the day you need to define in an agreed-upon way what a Gene object is about. This can be a generally agreed-upon ontology and then everyone codes against that; to me it just seems that if that's the standard, why can't you just put the standard into an interface with accordingly (i.e., biologically, as the ontology terms are) named methods. I just don't really see why a change of the ontology would break applications building on it less than a change of the respective interface would make them break. I may be wrong though.
>
> Would the semantic mapper do stuff like create intron objects
> from exons,
> etc?
>
> It seems the mapping must be in 2 parts; the first will manipulate the
> seqfteaure / subseqfeature hierarchy, eg to fix genbank split location
> mRNA features into 3 level
> gene/transcript/exon/translation/cds objects.
> The second part would go through and "bless" the objects
> appropriately. It
> would be nice to seperate those.
>
That's a perl implementation right? :)
BTW has anyone checked where BioJava stands in this regard? How does actually Apollo deal with that, i.e., does Apollo have a class hierarchy representing Genes and constituents? I think Dave is checking into that. Dave?
What about building upon the DB::GFF aggregators that Lincoln wrote?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------