[Open-bio-l] Re: [GMOD-devel] Schema for genes & features &
mappings to assemblies
Chris Mungall
cjm@bdgp.lbl.gov
Wed, 24 Apr 2002 10:13:06 -0700 (PDT)
On Tue, 23 Apr 2002, Lincoln Stein wrote:
> > - biosql is for sequences and features, not mappings to assemblies (is
> > that intended to be added, too, or is it beyond its scope? )
>
> > - GGB is
> > running off the schema in Bio::DB:GFF, which is not biosql compatible
> > (Lincoln? If so, do you have any plans to change that?)
>
> Bio::DB::GFF came about six months before biosql. GGB runs on top of both
> Bio::DB::GFF and Gadfly, which magically enough have similar APIs even though
> we didn't plan it that way. Someday soon I'm going to try to adapt BioSQL to
> support GGB, but to some extent its entry-based view of the world is at odds
> with the GGB view of the world. GGB sees the genome as a series of landmarks
> that occupy (potentially split, potentially multiple) regions of the genome.
I think the main difference is that in GGB (and GadFly), the seqfeature is
defined by the location, whereas in biosql the seqfeature can have 0:n
locations. In GGB the group can be regarded as being equivalent to the
biosql.seqfeature, with ggb.seqfeature being equivalent to
biosql.seqfeature_location
here is one possible mapping between GGB and biosql:
CREATE VIEW fdata
AS SELECT seqfeature_id AS fid,
e.accession AS fref,
fl.seq_start AS fstart,
fl.seq_end AS fstop,
f.seqfeature_key_id AS ftypeid,
NULL AS fscore,
fl.seq_strand AS fstrand,
NULL AS fphase,
f.seqfeature_id AS gid,
NULL AS ftarget_start,
NULL AS ftarget_stop
FROM seqfeature f,
seqfeature_location fl,
bioentry e
WHERE
fl.seqfeature_id = f.seqfeature_id AND
f.bioentry_id = e.bioentry_id;
CREATE VIEW fgroup
AS SELECT seqfeature_id AS gid,
NULL AS gclass,
seqfeature_id AS gname
FROM seqfeature;
CREATE VIEW ftype
AS SELECT ontology_term_id AS ftypeid,
term_name AS fmethod,
term_name AS fsource
FROM ontology_term;
I haven't tested this yet but I think it's right. the fdna view would
depend on what relation we decide on for assemblies.
This would also need extended for feature pairs, not currently in biosql.
Another major difference, apparent from the above mapping, is that in GGB,
all groups have a unique name. biosql, originally developed for genbank
roundtripping and having its heritage in the genbank feature tables, does
not. This is hacked above by using seqfeature_id, but this should be a
meaningful display label. I don't think there's any good way of doing
this, it is data dependent rather than schema dependent.
I never really understood GFF method/source. In biosql, a feature has one
(controlled) feature type. You can see how I hacked this above.