[BioSQL-l] Re: Postgres version support

Chris Mungall cjm@fruitfly.org
Tue, 22 Oct 2002 12:58:50 -0700 (PDT)


Hi Jason

There's two issues here - syntax and semantics:

Personally I prefer an SQL syntax for hand-generating, editing and
browsing relational schemas. It's a bit harder to parse (not too hard with
Parse::RecDescent), but that's taken care of with a standard CPAN module
[*]. The non-tagginess of SQL is handier when you want to start specifying
more complex stuff, like views, check constraints etc. Of course, XML
syntax is much handier for doing transformations on the schema, eg to
generate HTML. That's why I think it'd be great to be able to go from the
SQL we generate to MAGEstk-schema-xml, and from there to use your tools.

I think your can have the SQL syntax be fairly DB independent, so long as
you stick to SQL 92. There's always weird datatype issues like mysql
mediumtext but then switching to XML doesn't solve this.

As well as supplying an alternate syntax to SQL CREATE TABLE statements,
your DTD also seems to be offering a slightly different semantics to the
relational model. This makes perfect sense for the genex object-centric
approach. You want a more object-y kind of semantics, including
inheritance, and you are less concerned with a lot of the relational
semantics you lose out on.

However, I'm not sure I understand your point about cardinality - you can
specify any kind of cardinality with the relational model using foreign
key and uniqueness constraints. In fact with check constraints you can do
a lot more powerful stuff too, such as specifying "table 'car' has a 1:4
relationship with table 'wheel'".

Again, I personally prefer relational semantics, they are actually fairly
rich and expressive once you step out of the MySQL world [**]. Personally
I prefer bioSQL to take a more relation-centric approach, although there
are plenty people contributing more commits than me who prefer a more
object-centric approach (I think).

[*] However, I'd really like to add a generator for your MAGE-schema-xml
to the aforementioned CPAN parsing module (Actually I don't think the
module is on CPAN yet, but it will be soon - I'm cc-ing the developers
list). This will allow you to slurp any SQL schema out there into MAGE
land, including bioSQL. I think the results should be pretty cool.

[**] - Note I'm not advocating leaving MySQL out the loop (much as I'd
like to). A database is still (barely) usable without constraints
enforced.

As an afterword - If we want to move to an even more expressive semantics
than the relational model, then I think a description logic (DAML+OIL, or
OWL) is the only way to go. I think Matt and Thomas are quite keen on
this, I am too, but this is definitely much more research-y, as the tools
just aren't there yet.

--
Chris

On 22 Oct 2002, Jason E. Stewart wrote:

> "Chris Mungall" <cjm@fruitfly.org> writes:
>
> > personally I'm in favour of keeping the core schema in SQL (without
> > all the horrible mysqlisms), but we could easily convert to an XML
> > representation. In what sense would XML-schema definition allow for
> > a more powerful description language?
> >
> > I'm interested in the schema definition schema though - I'm rewriting the
> > SQL DDL parser along with Ken Clark @ cshl and would like to use this as
> > another format
>
> Hey Chris,
>
> A couple of thoughts in no particular order:
>
> - parsing XML is a lot easier than SQL (to me anyway)
> - The XML representation can be truly DB independent
> - It enables other useful features - the XML representation can be
>   converted by a simple XSLT transformation into a set of browsable
>   HTML files, it can be used to create a set of HTML forms and CGI
>   scripts that enable users to annotate their data, and it can be used
>   to auto-generate a low-level Perl object API for the DB
>
> However, the major reason was because Genex auto-creates it's object
> API from the DB schema, I discoverd I needed extra information about
> foreign key relationships that wasn't in the DDL SQL. I needed
> cardinality information: one-to-one, many-to-one, and
> many-to-many. That way, when I create a foreign-key accessor method in
> the API, I know how many objects I'm getting back from the DB, a
> scalar or an array ref.
>
> Also, it is important for the INSERT code - since the API code is
> auto-generated it has to be generic. If a foreign key accessor is a
> many-to-one relationship I need first myself (so I set a primary key)
> and then insert the list of objects into the other table (so they can
> reference my primary key). If it's one-to-one, I first insert the
> other object (so it gets a primary key) and then myself (so I can
> reference the primary key).
>
> Cheers,
> jas.
> --
>  "Naturally the common people don't want war: Neither in Russia, nor
>  in England, nor for that matter in Germany. That is understood. But,
>  after all, it is the leaders of the country who determine the policy
>  and it is always a simple matter to drag the people along, whether it
>  is a democracy, or a fascist dictatorship, or a parliament, or a
>  communist dictatorship. Voice or no voice, the people can always be
>  brought to the bidding of the leaders.  That is easy. All you have to
>  do is tell them they are being attacked, and denounce the peacemakers
>  for lack of patriotism and exposing the country to danger. It works
>  the same in any country." - Hermann Goering
>
>
>