[Bioperl-l] proposed additions to SeqFeatureI,
RangeI and FeatureHolderI
Lincoln Stein
lstein at cshl.edu
Thu Nov 20 20:24:36 EST 2003
On Wednesday 19 November 2003 09:47 pm, Chris Mungall wrote:
> I have some proposed changes I would like to commit to bioperl, mostly
> for using GFF3.
>
> In both SeqFeatureI and SeqFeature::Generic I would like to add some
> accessor methods. They would all map to tag-values.
>
> ID - synonym for tag_value('ID')[0]
> ParentIDs - synonym for tag_value('Parent')
I like this.
> add_ParentID
> remove_ParentID
> remove_ParentIDs
>
> Question - should the method be Parent or ParentID? In GFF3, the tag
> is "Parent". But an accessor method called "Parents()" feels like it
> should return objects, so I think ParentIDs() is better.
Do the methods return IDs or objects? If they're returning IDs, then the
ParentID() name sounds right.
> Also, I realise it's contrary to bioperl convention to have method
> names in caps, but it's nice to be consistent with the GFF3 tags.
If you want to be completely consistent with convention, how about get_ID()
and get_ParentIDs()? I have a private convention that initial capitalized
methods are autoloaded/autogenerated, but this is just me.
> I also notice that in SeqFeatureI we have an accessor definition and
> implementation for "primary_id". There is no definition for this.
>
> I propose either eliminating this, or making it a synonym of ID()
Good with me.
> I think we need clearly defined semantics for these fields. I think
> the semantics should be such that the ID should uniquely identify the
> feature. This is problemmatic, as most sources don't issue a unique
> accession or identifier for features. For example, genbank files
> provide a /gene for a lot of features, but this isn't even unique
> e.g. with multicopy genes. In cases where the data source does not
> provide a unique ID, we may want a way to generate them. So I think
> there should also be a method:
>
> generateID()
>
> which sets the ID field to something that's guaranteed unique. I'm not
> sure how. Perhaps a combination of the timestamp and the object memory
> reference?
I think there was a proposal for globally_unique_ID() at some point. Perhaps
time to resurrect that thread?
> Because I'm lazy I'd rather do all this in SeqFeatureI - it all
> delegates to existing methods. But I am unsure as to bioperl
> conventions regarding when an 'interface' has implementation code.
Happy to see it.
>
> ----
>
> I also want to add some code to FeatureHolderI, for dealing with the
> "nesting hierarchy" in bioperl, i.e. features that contain other
> features.
>
> The methods are:
>
> nest_features()
>
> creates a feature nesting hierarchy based on the "ID" and "Parent"
> tags. This is useful when parsing GFF3.
Yes, I like this.
>
> Also:
>
> flatten_features()
>
> for flattening the nesting hierarchy (so top_SeqFeatures and
> get_SeqFeatures return the same thing)
I like this too.
>
> Also:
>
> set_ParentIDs_from_hierarchy()
>
> This will go through the FeatureHolder hierarchy; any time it sees a
> feature with subfeatures, it will set the children's "Parent" tag
> according to the "ID" tag of the parent. If the parent does not have
> an ID, one will be generated.
This sounds like an internal method that nobody should ever see in the API!
> And nothing to do with the above code, I would like to add methods to
> RangeI for interbase coordinates. Love em or hate em, these methods
> will make some people's code easier at no cost to bioperl.
>
> First the interbase equivalent of start/end:
>
> istart
> iend
>
> Of course, iend is just a synonym for end, but it's nice for
> completion
>
> This is the equivalent of chado fmin/fmax.
>
> I would also like:
>
> ifrom
> ito
>
> For interbase directional coordinates. This is equivalent to
> istart,iend in the + strand, and the reverse of this in the - strand.
I have no objection to these guys going into the Interface as the appropriate
implemented methods. That way they'd be available everywhere.
Lincoln
>
> Let me know if there's any objections, otherwise I'll commit sometime
> next week.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
More information about the Bioperl-l
mailing list