[Bioperl-l] proposed additions to SeqFeatureI, RangeI and FeatureHolderI

Lincoln Stein lstein at cshl.edu
Thu Nov 20 20:24:36 EST 2003


On Wednesday 19 November 2003 09:47 pm, Chris Mungall wrote:
> I have some proposed changes I would like to commit to bioperl, mostly
> for using GFF3.
>
> In both SeqFeatureI and SeqFeature::Generic I would like to add some
> accessor methods. They would all map to tag-values.
>
>   ID         - synonym for tag_value('ID')[0]
>   ParentIDs  - synonym for tag_value('Parent')

I like this.

>   add_ParentID
>   remove_ParentID
>   remove_ParentIDs
>
> Question - should the method be Parent or ParentID? In GFF3, the tag
> is "Parent". But an accessor method called "Parents()" feels like it
> should return objects, so I think ParentIDs() is better.

Do the methods return IDs or objects?  If they're returning IDs, then the 
ParentID() name sounds right.

> Also, I realise it's contrary to bioperl convention to have method
> names in caps, but it's nice to be consistent with the GFF3 tags.

If you want to be completely consistent with convention, how about get_ID() 
and get_ParentIDs()?  I have a private convention that initial capitalized 
methods are autoloaded/autogenerated, but this is just me.


> I also notice that in SeqFeatureI we have an accessor definition and
> implementation for "primary_id". There is no definition for this.
>
> I propose either eliminating this, or making it a synonym of ID()

Good with me.

> I think we need clearly defined semantics for these fields. I think
> the semantics should be such that the ID should uniquely identify the
> feature. This is problemmatic, as most sources don't issue a unique
> accession or identifier for features. For example, genbank files
> provide a /gene for a lot of features, but this isn't even unique
> e.g. with multicopy genes. In cases where the data source does not
> provide a unique ID, we may want a way to generate them. So I think
> there should also be a method:
>
>   generateID()
>
> which sets the ID field to something that's guaranteed unique. I'm not
> sure how. Perhaps a combination of the timestamp and the object memory
> reference?

I think there was a proposal for globally_unique_ID() at some point.  Perhaps 
time to resurrect that thread?

> Because I'm lazy I'd rather do all this in SeqFeatureI - it all
> delegates to existing methods. But I am unsure as to bioperl
> conventions regarding when an 'interface' has implementation code.

Happy to see it.

>
> ----
>
> I also want to add some code to FeatureHolderI, for dealing with the
> "nesting hierarchy" in bioperl, i.e. features that contain other
> features.
>
> The methods are:
>
>   nest_features()
>
> creates a feature nesting hierarchy based on the "ID" and "Parent"
> tags. This is useful when parsing GFF3.

Yes, I like this.

>
> Also:
>
>   flatten_features()
>
> for flattening the nesting hierarchy (so top_SeqFeatures and
> get_SeqFeatures return the same thing)

I like this too.

>
> Also:
>
>   set_ParentIDs_from_hierarchy()
>
> This will go through the FeatureHolder hierarchy; any time it sees a
> feature with subfeatures, it will set the children's "Parent" tag
> according to the "ID" tag of the parent. If the parent does not have
> an ID, one will be generated.

This sounds like an internal method that nobody should ever see in the API!


> And nothing to do with the above code, I would like to add methods to
> RangeI for interbase coordinates. Love em or hate em, these methods
> will make some people's code easier at no cost to bioperl.
>
> First the interbase equivalent of start/end:
>
>   istart
>   iend
>
> Of course, iend is just a synonym for end, but it's nice for
> completion
>
> This is the equivalent of chado fmin/fmax.
>
> I would also like:
>
>   ifrom
>   ito
>
> For interbase directional coordinates. This is equivalent to
> istart,iend in the + strand, and the reverse of this in the - strand.

I have no objection to these guys going into the Interface as the appropriate 
implemented methods.  That way they'd be available everywhere.

Lincoln

>
> Let me know if there's any objections, otherwise I'll commit sometime
> next week.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


More information about the Bioperl-l mailing list