[Bioperl-l] proposed additions to SeqFeatureI, RangeI and
FeatureHolderI
Chris Mungall
cjm at fruitfly.org
Wed Dec 3 14:46:29 EST 2003
On Thu, 20 Nov 2003, Lincoln Stein wrote:
> On Wednesday 19 November 2003 09:47 pm, Chris Mungall wrote:
> > I have some proposed changes I would like to commit to bioperl, mostly
> > for using GFF3.
> >
> > In both SeqFeatureI and SeqFeature::Generic I would like to add some
> > accessor methods. They would all map to tag-values.
> >
> > ID - synonym for tag_value('ID')[0]
> > ParentIDs - synonym for tag_value('Parent')
>
> I like this.
>
> > add_ParentID
> > remove_ParentID
> > remove_ParentIDs
> >
> > Question - should the method be Parent or ParentID? In GFF3, the tag
> > is "Parent". But an accessor method called "Parents()" feels like it
> > should return objects, so I think ParentIDs() is better.
>
> Do the methods return IDs or objects? If they're returning IDs, then the
> ParentID() name sounds right.
Ok, let's go for ParentID
> > Also, I realise it's contrary to bioperl convention to have method
> > names in caps, but it's nice to be consistent with the GFF3 tags.
>
> If you want to be completely consistent with convention, how about get_ID()
> and get_ParentIDs()? I have a private convention that initial capitalized
> methods are autoloaded/autogenerated, but this is just me.
I had imagined these to be 'first-class' accessors, like primary_tag(),
seq(), etc (although they would be synonyms for get_tag_values('ID'),
set_tag_values('ID'), ...)
there seems to be 3 different kinds of attributes:
foo() foo($foo)
get_foo() set_foo($foo)
get_tag_values('foo') set_tag_values('foo', [$foo])
I'm not sure what the rules are for deciding which attributes have which
kinds of accessor
> > I also notice that in SeqFeatureI we have an accessor definition and
> > implementation for "primary_id". There is no definition for this.
> >
> > I propose either eliminating this, or making it a synonym of ID()
>
> Good with me.
Ok
> > I think we need clearly defined semantics for these fields. I think
> > the semantics should be such that the ID should uniquely identify the
> > feature. This is problemmatic, as most sources don't issue a unique
> > accession or identifier for features. For example, genbank files
> > provide a /gene for a lot of features, but this isn't even unique
> > e.g. with multicopy genes. In cases where the data source does not
> > provide a unique ID, we may want a way to generate them. So I think
> > there should also be a method:
> >
> > generateID()
> >
> > which sets the ID field to something that's guaranteed unique. I'm not
> > sure how. Perhaps a combination of the timestamp and the object memory
> > reference?
>
> I think there was a proposal for globally_unique_ID() at some point. Perhaps
> time to resurrect that thread?
This is a tricky one...
> > Because I'm lazy I'd rather do all this in SeqFeatureI - it all
> > delegates to existing methods. But I am unsure as to bioperl
> > conventions regarding when an 'interface' has implementation code.
>
> Happy to see it.
Ok
> >
> > ----
> >
> > I also want to add some code to FeatureHolderI, for dealing with the
> > "nesting hierarchy" in bioperl, i.e. features that contain other
> > features.
> >
> > The methods are:
> >
> > nest_features()
> >
> > creates a feature nesting hierarchy based on the "ID" and "Parent"
> > tags. This is useful when parsing GFF3.
>
> Yes, I like this.
>
> >
> > Also:
> >
> > flatten_features()
> >
> > for flattening the nesting hierarchy (so top_SeqFeatures and
> > get_SeqFeatures return the same thing)
>
> I like this too.
>
> >
> > Also:
> >
> > set_ParentIDs_from_hierarchy()
> >
> > This will go through the FeatureHolder hierarchy; any time it sees a
> > feature with subfeatures, it will set the children's "Parent" tag
> > according to the "ID" tag of the parent. If the parent does not have
> > an ID, one will be generated.
>
> This sounds like an internal method that nobody should ever see in the API!
Ok
> > And nothing to do with the above code, I would like to add methods to
> > RangeI for interbase coordinates. Love em or hate em, these methods
> > will make some people's code easier at no cost to bioperl.
> >
> > First the interbase equivalent of start/end:
> >
> > istart
> > iend
> >
> > Of course, iend is just a synonym for end, but it's nice for
> > completion
> >
> > This is the equivalent of chado fmin/fmax.
> >
> > I would also like:
> >
> > ifrom
> > ito
> >
> > For interbase directional coordinates. This is equivalent to
> > istart,iend in the + strand, and the reverse of this in the - strand.
>
> I have no objection to these guys going into the Interface as the appropriate
> implemented methods. That way they'd be available everywhere.
Ok
> Lincoln
Chris
More information about the Bioperl-l
mailing list