[Bioperl-l] new GFF3 support methods added

Chris Mungall cjm at fruitfly.org
Mon Mar 8 16:07:16 EST 2004

On Fri, 5 Mar 2004, Hilmar Lapp wrote:

> Nice work Chris. I don't have very many comments.
> On Friday, March 5, 2004, at 05:52  PM, Chris Mungall wrote:
> >   FeatureHolderI->set_ParentIDs_from_hierarchy
> >
> >     sets both ID and ParentID from FeatureHolder hierarchy
> >
> >   FeatureHolderI->create_hierarchy_from_ParentIDs
> >
> >     the inverse of set_ParentIDs_from_hierarchy
> >
> Hmm - is this unique to feature graphs or to GFF3?

I think it's useful for any time you want to capture the hierarchical
knowledge from the bioperl FeatureHolder hierarchy in any kind of format
that does not explicitly capture this kind of info.

You could use this with any flavour of GFF - the difference with GFF3 is
that the ID and Parent tags are part of the format specification

If you wanted to persist your bioperl data through genbank files, this
would also be useful, as features could use /ID and /Parent tags (although
this is actually an extension of the genbank format)

It is useful in a variety of contexts, but it does 'clutter up'
FeatureHolder a little

> >   SeqFeatureI->generate_unique_persistent_id
> >
> >     this is required by the above method
> >
> >     Lincoln wanted this to be private, but I think it has
> >     to be called from outside
> I wouldn't want it to be private, but I'd rather make this a property
> representing a generator (with a single required method
> $generator->generate_id($feature)), or alternatively - and also simpler
> - a closure, than a method. The reason is that if you wanted to change
> the way the ID is generated you'd have to subclass an entire
> SeqFeatureI implementation instead of just setting the property to some
> anonymous method you whip up.

Much as I like closures, I think there should be a standard sanctioned way
of generating unique/persistent IDs

> > Unique IDs in bioperl:
> >
> > In the discussion that preceeded this, it seemed that people liked the
> > idea of persistent unique IDs, but there was no suggestions as to how
> > to
> > go about it. This is inherently difficult with objects, but I borrowed
> > a
> > solution from relational modeling.
> >
> > A persistent unique ID is generated using
> >
> >   seq_id
> >   primary_tag
> >   start
> >   end
> >
> > It is assumed that these are all set and comprise a "unique key" over
> > features.
> Hmm. Wouldn't you need to include source_tag()? (Source_tag is part of
> the unique key in biosql.) Without the source_tag being part of this,
> wouldn't that mean you cannot have the exact (start+end) same segment
> predicted as exon by different methods and have those different
> predictions co-exist as separate features in the graph? (Presumably
> those would only differ in source_tag)

Good point! I've added this

Ok, so are you saying that some of these methods don't really belong in
the class-interfaces they are in?

Perhaps it would be better to have a Bio::SeqFeature::Tools::IDHandler
class? This would contain methods

  generate_unique_persistent_id($feat) # uses $feat->seq_id
  generate_unique_persistent_id($feat, $seq_id)


Any preference? I'm leaving them as is for now if that's ok, but I have no
objections to moving everything to a seperate class if that's prefered.

> >
> > Another assumption is that seq_id is unique and persistent.
> >
> I don't think that's going to be a very safe assumption.

Fair enough, but I think this is the responsibility of the user.

This is also partially the fault of the bioperl model - $feat->seq_id()
generally refers to the versionless accession no. There isn't really a way
to get to the versioned seq_id from the feature. I think it would cause
too much churn to add at this late stage...

> >
> > * A GeneModel factory
> >
> > This would take the output of the unflattener (a set of feature graphs
> > typed to SO) and make SeqFeature::Gene objects
> >
> Yeah, that'd be cool ...


> 	-hilmar

More information about the Bioperl-l mailing list