[Open-bio-l] Re: [GMOD-devel] Schema for genes & features & mappings to assemblies

Chris Mungall cjm@bdgp.lbl.gov
Wed, 24 Apr 2002 10:13:06 -0700 (PDT)


On Tue, 23 Apr 2002, Lincoln Stein wrote:

> > 	- biosql is for sequences and features, not mappings to assemblies (is
> > that intended to be added, too, or is it beyond its scope? )
>
> > - GGB is
> > running off the schema in Bio::DB:GFF, which is not biosql compatible
> > (Lincoln? If so, do you have any plans to change that?)
>
> Bio::DB::GFF came about six months before biosql.  GGB runs on top of both
> Bio::DB::GFF and Gadfly, which magically enough have similar APIs even though
> we didn't plan it that way.  Someday soon I'm going to try to adapt BioSQL to
> support GGB, but to some extent its entry-based view of the world is at odds
> with the GGB view of the world.  GGB sees the genome as a series of landmarks
> that occupy (potentially split, potentially multiple) regions of the genome.

I think the main difference is that in GGB (and GadFly), the seqfeature is
defined by the location, whereas in biosql the seqfeature can have 0:n
locations. In GGB the group can be regarded as being equivalent to the
biosql.seqfeature, with ggb.seqfeature being equivalent to
biosql.seqfeature_location

here is one possible mapping between GGB and biosql:

CREATE VIEW fdata
 AS SELECT seqfeature_id     AS fid,
           e.accession       AS fref,
           fl.seq_start      AS fstart,
           fl.seq_end        AS fstop,
           f.seqfeature_key_id   AS ftypeid,
           NULL              AS fscore,
           fl.seq_strand     AS fstrand,
           NULL              AS fphase,
           f.seqfeature_id   AS gid,
           NULL              AS ftarget_start,
           NULL              AS ftarget_stop
    FROM seqfeature f,
         seqfeature_location fl,
         bioentry e
    WHERE
          fl.seqfeature_id = f.seqfeature_id        AND
          f.bioentry_id    = e.bioentry_id;

CREATE VIEW fgroup
 AS SELECT seqfeature_id     AS gid,
           NULL              AS gclass,
           seqfeature_id     AS gname
 FROM seqfeature;

CREATE VIEW ftype
 AS SELECT ontology_term_id  AS ftypeid,
           term_name         AS fmethod,
           term_name         AS fsource
 FROM ontology_term;

I haven't tested this yet but I think it's right. the fdna view would
depend on what relation we decide on for assemblies.

This would also need extended for feature pairs, not currently in biosql.

Another major difference, apparent from the above mapping, is that in GGB,
all groups have a unique name. biosql, originally developed for genbank
roundtripping and having its heritage in the genbank feature tables, does
not. This is hacked above by using seqfeature_id, but this should be a
meaningful display label. I don't think there's any good way of doing
this, it is data dependent rather than schema dependent.

I never really understood GFF method/source. In biosql, a feature has one
(controlled) feature type. You can see how I hacked this above.