[Bioperl-l] Gene Interface discussion

Ewan Birney birney@ebi.ac.uk
Mon, 5 Feb 2001 11:03:07 +0000 (GMT)


On Sun, 4 Feb 2001, Hilmar Lapp wrote:

> Apart from BioCorba 0.2, this is the last big issue on the task
> pile for 0.7. To refresh your background, Ewan and I discussed
> this in a phone call in December last year, and after that Ewan
> summarized the results, and added some things he dreamt up :-)
> 
> You can find Ewan's full proposal at
> http://bioperl.org/pipermail/bioperl-l/2000-December/001893.html.
> 
> I cross-posted this again to ensembl-dev basically to make the
> folks there aware that the issue is being taken on again.
> Responses from bioperl folks should probably NOT be cross-posted
> (but decide yourself) ...
> 
> Ewan Birney wrote:
> > 
> >   All interfaces in the Bio::SeqFeature:: namespace
> > 
> 
> There are 3 of them -- together with implementations
> Bio::SeqFeature may become a bit crowded. What do you think about
> Bio::SeqFeature::Gene, or directly Bio::Gene?

I am happy with any of the above. Marginal preference for

Bio::SeqFeature::Gene as we are inherieting off Bio::SeqFeature

> 
> I don't have a strong opinion here, though.
> 
> >   GeneStructureI - inheriets from SeqFeatureI
> > 
> >   (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag
> >    is_single_sequence, sub_SeqFeatures);
> > 
> >   Notes: sub_SeqFeatures must delegate to ->transcripts.
> >        : primary_tag must be 'genestructure'
> > 
> 
> And what about promotors() and poly_adenylation_sites()? These are
> subfeatures, too. So, sub_SeqFeature() should rather merge them
> all together, shouldn't it?
> 

I would be ok with this.

> > # GeneStructureI must implement this, even if it returns an empty list
> >
> > @promotors = $gs->promotors(); # could be empty
> > @polya     = $gs->polya(); # could be empty
> 
> So what is the difference to the respective methods of
> TranscriptI? Delegates to the first element on the array returned
> by $gs->transcripts()?
> 

These would give back unions over all Transcripts of promotors etc


> >   TranscriptI - inheriets from SeqFeatureI
> > 
> >   (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag
> >    is_single_sequence, sub_SeqFeatures);
> > 
> >   Transcript must have the following two methods
> > 
> >   $transcript->cdna();    # returns a Bio::PrimarySeqI of the cDNA
> >   $transcript->protein(); # returns a Bio::PrimarySeqI of the protein
> > 
> 
> protein() I think is trivial unless it's a predicted transcript
> (and TranscriptI is not specific to predicted transcripts). What
> is the particular reason to require it in the interface?
> 

Very handy, and not so trivial. Different implementations might want to
handle the making of protein differently (think weird seloncystine cases
or heavy RNA editing).


> >    ExonI - inheriets from SeqFeatureI, cannot be composite,
> >            primary_tag must return one of 'exon' or 'cds' or 'utr'
> > 
> 
> How do you mean 'cannot be composite'? The interface cannot forbid
> it. Should the implementation refuse subfeatures not lying on the
> same sequence and refuse a SplitLocation spread across more than
> one sequence (or SplitLocations in general)?
> 

I mean ExonI should be defined to have a Simple Location, no splits.


> > 
> > To Do list:
> > 
> >    (a) discuss this proposal. Sane? Any more issues to be worked out?
> > 
> >    I am not 100% on the exons('argument') style call.
> > 
> 
> I think that's fine. Otherwise you end up with methods for each
> type of exon (initial, terminal, internal, ...).
> 

I am ok with this.

> >    The exon primary_tag is actually a hard thing to provide. Should
> >    the primary_tag change depending on the argument - this is very
> >    nasty for the implementation objects.
> > 
> >    (b) figure out how to get these things in and out of
> >        EMBL/GenBank format without loss of information
> > 
> 
> In general this would be a *very* good thing to have. But it also
> means venturing on the semantics of Genbank features. If this
> shall make it into 0.7, we'll have to extend the deadline.
> 

I'd punt on this for 0.7 and flag it as a 0.8 possibility.

> How do people see the chances of success, and in which time frame?
> 
> Any takers?
> 
> >    (c) Ditto with GAME
> > 
> 
> Brad? 
> 
> > Implementations:
> > 
> >     Hilmar/Ewan to do bioperl implementations
> > 
> >     Hilmar to do bioperl parsing modules
> > 
> >     Ewan/Hilmar to do the interfaces files
> 
> Interface & implementation is okay, and I'll take care of the gene
> prediction parsers. The GenBank/EMBL gene feature needs a
> braveheart who either has enough time or already enough code, or -
> probably the best - both.
> 
> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122                  phone: +1 858 812 1757
> -----------------------------------------------------------------
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------