[Bioperl-l] Gene Interface discussion
Ewan Birney
birney@ebi.ac.uk
Mon, 5 Feb 2001 11:03:07 +0000 (GMT)
On Sun, 4 Feb 2001, Hilmar Lapp wrote:
> Apart from BioCorba 0.2, this is the last big issue on the task
> pile for 0.7. To refresh your background, Ewan and I discussed
> this in a phone call in December last year, and after that Ewan
> summarized the results, and added some things he dreamt up :-)
>
> You can find Ewan's full proposal at
> http://bioperl.org/pipermail/bioperl-l/2000-December/001893.html.
>
> I cross-posted this again to ensembl-dev basically to make the
> folks there aware that the issue is being taken on again.
> Responses from bioperl folks should probably NOT be cross-posted
> (but decide yourself) ...
>
> Ewan Birney wrote:
> >
> > All interfaces in the Bio::SeqFeature:: namespace
> >
>
> There are 3 of them -- together with implementations
> Bio::SeqFeature may become a bit crowded. What do you think about
> Bio::SeqFeature::Gene, or directly Bio::Gene?
I am happy with any of the above. Marginal preference for
Bio::SeqFeature::Gene as we are inherieting off Bio::SeqFeature
>
> I don't have a strong opinion here, though.
>
> > GeneStructureI - inheriets from SeqFeatureI
> >
> > (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag
> > is_single_sequence, sub_SeqFeatures);
> >
> > Notes: sub_SeqFeatures must delegate to ->transcripts.
> > : primary_tag must be 'genestructure'
> >
>
> And what about promotors() and poly_adenylation_sites()? These are
> subfeatures, too. So, sub_SeqFeature() should rather merge them
> all together, shouldn't it?
>
I would be ok with this.
> > # GeneStructureI must implement this, even if it returns an empty list
> >
> > @promotors = $gs->promotors(); # could be empty
> > @polya = $gs->polya(); # could be empty
>
> So what is the difference to the respective methods of
> TranscriptI? Delegates to the first element on the array returned
> by $gs->transcripts()?
>
These would give back unions over all Transcripts of promotors etc
> > TranscriptI - inheriets from SeqFeatureI
> >
> > (inherieted methods, start,end,strand,seq,entire_seq,seqname,primary_tag,source_tag
> > is_single_sequence, sub_SeqFeatures);
> >
> > Transcript must have the following two methods
> >
> > $transcript->cdna(); # returns a Bio::PrimarySeqI of the cDNA
> > $transcript->protein(); # returns a Bio::PrimarySeqI of the protein
> >
>
> protein() I think is trivial unless it's a predicted transcript
> (and TranscriptI is not specific to predicted transcripts). What
> is the particular reason to require it in the interface?
>
Very handy, and not so trivial. Different implementations might want to
handle the making of protein differently (think weird seloncystine cases
or heavy RNA editing).
> > ExonI - inheriets from SeqFeatureI, cannot be composite,
> > primary_tag must return one of 'exon' or 'cds' or 'utr'
> >
>
> How do you mean 'cannot be composite'? The interface cannot forbid
> it. Should the implementation refuse subfeatures not lying on the
> same sequence and refuse a SplitLocation spread across more than
> one sequence (or SplitLocations in general)?
>
I mean ExonI should be defined to have a Simple Location, no splits.
> >
> > To Do list:
> >
> > (a) discuss this proposal. Sane? Any more issues to be worked out?
> >
> > I am not 100% on the exons('argument') style call.
> >
>
> I think that's fine. Otherwise you end up with methods for each
> type of exon (initial, terminal, internal, ...).
>
I am ok with this.
> > The exon primary_tag is actually a hard thing to provide. Should
> > the primary_tag change depending on the argument - this is very
> > nasty for the implementation objects.
> >
> > (b) figure out how to get these things in and out of
> > EMBL/GenBank format without loss of information
> >
>
> In general this would be a *very* good thing to have. But it also
> means venturing on the semantics of Genbank features. If this
> shall make it into 0.7, we'll have to extend the deadline.
>
I'd punt on this for 0.7 and flag it as a 0.8 possibility.
> How do people see the chances of success, and in which time frame?
>
> Any takers?
>
> > (c) Ditto with GAME
> >
>
> Brad?
>
> > Implementations:
> >
> > Hilmar/Ewan to do bioperl implementations
> >
> > Hilmar to do bioperl parsing modules
> >
> > Ewan/Hilmar to do the interfaces files
>
> Interface & implementation is okay, and I'll take care of the gene
> prediction parsers. The GenBank/EMBL gene feature needs a
> braveheart who either has enough time or already enough code, or -
> probably the best - both.
>
> Hilmar
> --
> -----------------------------------------------------------------
> Hilmar Lapp email: hlapp@gmx.net
> GNF, San Diego, Ca. 92122 phone: +1 858 812 1757
> -----------------------------------------------------------------
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------