[Bioperl-l] Re: [Bioperl-announce-l] an extension to Bio::SeqIO
Lincoln Stein
lstein at cshl.edu
Tue Jun 17 16:42:34 EDT 2003
Hi,
Once I hear that there's a more-or-less consensus on the GFF3 spec that I sent
out, I'll make the appropriate changes to Bio::SeqIO to support it. The
important features are:
- ontology-based types & relationships
- parentage relationships are independent of position relationships
- multiple parents
- split locations
- separation of group from ID
- arbitrary attributes (but they can be constrained by a CV if need be)
I think this will go a long way toward making Jason's Transmogrifiers easier
to write, and help with chado roundtripping.
Lincoln
On Tuesday 17 June 2003 12:33 pm, Jason Stajich wrote:
> There is a bit of chicken-egg problem in that most of the data sets
> Bioperl has tried to interface with are not as rich as chado, the
> genbank->gene->chado way is not going to work for all genbank records
> (which I personally can live with). I would like to see if we can define
> the least-common denominator for people to understand what needs to get a
> chado db populated.
>
>
> As we've been discussing in different venues I think we'd like to see a
> general purpose system which can take a collection of sequence features,
> relate them in a graph based on an identifiable grouping (the /gene field
> or perhaps mapped into a general slot like 'group' ala Lincoln's
> Bio::DB::GFF system), and then using SO map these into objects. For genes
> I'd like to see these be Bio::SeqFeature::Gene::GeneStructure objects
> (the object model of which might need some work) because there are
> additional methods already built in like intron inferences and ability to
> loop through the transcripts, etc.
>
> So my request is that we make the chado writer dumb, it should not try to
> group anything, but should just obey however the objects are built. An
> intermediete set of factories can take lists of features and assign
> 'group' fields to them, a second factory could relate them into a graph
> based on SO and the group fields. This graph can now be written out to
> chadoxml. Another factory (I was calling Bio::SeqFeature::Transmogrifier
> for the calvin and hobbes fans) could build the appropriate composite
> objects from the graph (Genes, HSPs, where appropriate) and deal with
> multiple coordinate systems (in the case of features attached to the
> annotated protein product). The 'Transmogrifier' could also turn these
> composite objects back into simple feature graphs so that they can be
> written to chado simply and (finally) fully written out to a genbank
> record with a controlled vocab of /tag=value fields.
>
> These are my ideas anyways, perhaps too much? I know other people (Shawn
> Hoon, Chris Mungall) have volunteered ideas and coding to this as well so
> we'd like to see if we can perhaps work together on it.
>
> For examples of some minimal gene objects, the easiest way to get them
> right now is from any of the gene prediction parser (
> Bio::Tools::Genewise, Bio::Tools::Genomewise, Bio::Tools::Genscan,
> Bio::Tools::Glimmer).
>
> -jason
>
> On Tue, 17 Jun 2003, Peili Zhang wrote:
> > Hi,
> >
> > here at FlyBase, we implement chado database schema to store sequence,
> > annotation, genetic, controlled vocabulary, publication and other types
> > of data (for detailed information about chado schema, please visit
> > http://www.gmod.org and read the schema documentations and scripts in
> > its CVS). we have developed tools to dump FlyBase data into chadoxml
> > and load data in chadoxml format into FlyBase (for chadoxml dtd, please
> > see
> > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/schema/chado/dat/chad
> >o.dtd), to facilitate data communication among the different sites of
> > FlyBase and between FlyBase and the rest of the world. need arises for a
> > tool to convert external data in other formats into chadoxml. I'm coding
> > a perl module chadoxml.pm to write out a Bio::Seq object into chadoxml.
> > we'd like to get your feedback on whether it's useful to add this module
> > into bioperl as an extension to the Bio::SeqIO package. if you already
> > have working code for the same purpose, maybe we can discuss how to merge
> > our code to produce a better version.
> >
> > thanks for your input.
> >
> > regards,
> > Peili Zhang
> > FlyBase-Harvard
> >
> > _______________________________________________
> > Bioperl-announce-l mailing list
> > Bioperl-announce-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-announce-l
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein at cshl.org Cold Spring Harbor, NY
========================================================================
More information about the Bioperl-l
mailing list