[Bioperl-l] new GFF3 support methods added
lstein at cshl.edu
Mon Mar 8 13:17:26 EST 2004
-----BEGIN PGP SIGNED MESSAGE-----
Nice job. My only comment is that there's been a great deal of
consternation over the role of whitespace in GFF3 recently and I am
thinking changing the column delimiter back to strict tabs and
allowing spaces (but no tabs or other unescaped whitespace) in the
fields. I don't think this will affect your methods at all, but just
a heads up.
On Friday 05 March 2004 08:52 pm, Chris Mungall wrote:
> I have committed some new stuff to bioperl-live:
> the script seq/unflatten_seq will now generate GFF3 - the
> unflattener module is used to build the 'feature graph' connecting
> genes, transcripts, exons and CDSs together. This means we can have
> GFF3 for anything in genbank!
> As far as I'm aware, the only other sensible output formats to use
> here (ie formats that support feature graphs/containment
> hierarchies) are: chado, chaos, and the write-only asciitree.
> This feature graph is written out in the GFF3 using the ID and
> Parent tags. To do this there is an extra intermediate step - the
> bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags
> are generated.
> Here is a description of the changes I have made:
> [unless you're a bioperl hacker you don't really need to read the
> rest of this]
> You can get the context of what I'm on about from this thread:
> Two new public methods:
> sets both ID and ParentID from FeatureHolder hierarchy
> this is required by the above method
> Lincoln wanted this to be private, but I think it has
> to be called from outside
> the inverse of set_ParentIDs_from_hierarchy
> (note that I have put the implementation in the interface - in the
> absence of proper abstract classes, this was deemed the best thing
> to do in the previous discussion on this)
> This now maps to the tag_value 'ID' (ie the tag that GFF3 uses to
> uniquely identify a feature).
> Minor modification
> Bio::Tools::GFF now allows the -noparse=>1 option
> this is simply to stop the module waiting on input from STDIN
> when used in write-mode (maybe there's a better way of doing this
> but I didn't want to mess with this module)
> This unflattens a genbank sequences and roundtrips it to chadoxml
> via GFF3
> This doesn't work yet - if you dump a splitfeature as GFF3 and
> re-import it, it becomes two features. Any volunteers to help
> fix this?
> Unique IDs in bioperl:
> In the discussion that preceeded this, it seemed that people liked
> the idea of persistent unique IDs, but there was no suggestions as
> to how to go about it. This is inherently difficult with objects,
> but I borrowed a solution from relational modeling.
> A persistent unique ID is generated using
> It is assumed that these are all set and comprise a "unique key"
> over features. Of course, there's no way to enforce this with
> objects. The generated ID is simply these values concatenated with
> : delimiters. You can think of this is a skolem function if you're
> that way inclined.
> Another assumption is that seq_id is unique and persistent.
> Of course, if you're dealing with data that changes with time, then
> changing the coordinates of a feature will change it's id. This is
> fine. If you want to use your own IDs rather than the generated
> ones, you can simply set the primary_id() field - or if you are
> using genbank files, add something like this
> to the feature.
> Stuff still to do:
> * fix GFF3 to deal with roundtripping splitlocations
> * A nest_features() method, as discussed in the previous email.
> This is the opposite of set_ParentIDs_from_hierarchy(), for reading
> in GFF3 (and then writing to a feature-graph compatible format, or
> to a database such as biosql or chado).
> * Bio::SeqIO::GFF3
> I know a lot of us would like this a lot - is there any plans to
> implement this yet?
> * A GeneModel factory
> This would take the output of the unflattener (a set of feature
> graphs typed to SO) and make SeqFeature::Gene objects
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Bioperl-l