[BioSQL-l] RE: Exploring data in database (Kris Boulez)
McCulloch, Alan
alan.mcculloch at agresearch.co.nz
Fri Mar 28 10:21:38 EST 2003
hi ,
regarding the question below about browsing a database, here are a few
additions to the
responses so far
1) we are using an Oracle database. Some of my suggestions below
are not applicable to mysql and probably others - e.g. developing an
API of stored functions. Therefore I agree with the previous
respondents on the need to
get a serious database.
2) something we have found useful in addition to views, is a
comprehensive API of stored
functions. This has the advantage that simple SQL queries using
stored functions can replace
horrendous or even impossible queries involving table joins. Quite
powerful queries can then
be built up on top of the stored function API pretty easily. You can
write something like....
select
dbutils.ourlink(sequencename) ourlink,
dbutils.dblink(dbutils.getTopDBHit(sequencename,'SWISSPROT','hitid'),'SW
ISSPROT') swisslink,
dbutils.getTopDBHit(sequencename,'SWISSPROT','description')
swissdescription,
dbutils.ourlink(dbutils.getContigName(sequencename))
contiglink,
dbutils.getBlastLink(sequencename) blastbutton,
dbutils.getExpressionClusterName(sequencename)
expressionCluster
from
ourseqdb
where
library = 'somelib'
- in the above there are examples of stored functions to make
hyperlinks to a resource page
for a seq ( from which can click through to tracefile and other
material), pull out the best hit to
some database (like Swissprot) and make a hyperlink out of it, pull
out the description of the
swissprot hit , pull out which contig the seq belongs to (and
hyperlink that) ,
pull out which expression cluster, return a hyperlink that will
take some action like blast the sequence
against some database, etc.
A single query like this probably involves a fairly large number of
tables in a normalised
database , and could probably not be achieved by using table joins
or views.
We have also found this approach gives very good performance.
Building queries on top
of views can run into performance problems once you start joining
views.
By developing such an API in the database server, as opposed to in
applications, it is available
to all clients - ACCESS, or whatever else. Also , because these run
inside the database,
they are alot faster than if something like this was done at the
application level.
It greatly helps developers who can build more complex reports on
top of the API , and
also makes the database a more accessible to (the very few !)
end-users who may be willing
to enter SQL code
3) We started off distributing ACCESS applications with a bunch of
linked tables
and canned queries as per your first option but wanted to move to a
web environment.
We therefore set up a (Java based) middleware server that stands
between web page forms
that contain (usually hidden) SQL code, and the database. Web forms
are submitted , the middleware
gets the details from the form, that include SQL to be executed and
a bunch of user-set variables,
excutes it and sends back results.
Over time you can build quite alot of useful smarts into a
middleware server like this , for
example
* result formatting - the middleware can format each row returned
from the
database before sending back to client. Then any query
submitted via the
middleware can take advantage of the formatting service. Our
server supports
Fixed width, CSV, FASTA, XML and GCG list file formats
currently. So you can
do quite nice things like execute a SQL query and have the
results returned
as a FASTA extract directly.
* SQL code binding/modification. For example you can do something
like , have a form
with some embedded code "select * from :tablename" , and then
have a drop
down list where a user selects a table. The middleware binds
the :tablename
reference from the field submitted with the form. (Maybe this
isn't a very good example
! but you get the idea). Can get some quite flexible web-form
reports very easily by
using this code-binding approach. Can bind clauses, variable
lists , predicates, or whatever
else.
Also , you can get the middleware to do some list-handling. For
example you can have
a text area on a form into which the user pastes a list of
sequencenames or whatever,
and the middleware can get that and bind it to an "in" clause
....."where seqid in
('aaaa','bbbb','cccc','dddd','eeee')"
(And also of course the usual variable-binding "select * from
sometable where seqid = :seqid")
* optimise throughput. Many queries are exploratory , and a user
may only want to see
the first 10 sequences that match, and not have to wait for the
complete
recordset to be returned. The middleware can keep a record
count and kill a query after a
certain number of rows have been returned - the default might
be say 10, but the
user can change this because the limit is set in a field on the
form submitted.
* implement a time-out protocol. A problem with queries via
browser is the browser time-out.
The middleware can set a watchdog that wakes up before the
browser would time out,
and send a response back to the server "get your results from
X" , and then redirect
the results to "X".
* authentication and security
This might sound a bit ambitious, but in fact it is very easy to
set up a simple middleware
server in Java , and then you can add funtionality as you go.
Hope this is useful
Cheers
Alan McCulloch
Bioinformatics Software Engineer
AgResearch NZ
Message: 1
Date: Wed, 26 Mar 2003 15:55:54 +0100
From: Kris Boulez <kris.boulez at algonomics.com>
Subject: [BioSQL-l] Exploring data in database
To: biosql-l at open-bio.org
Message-ID: <20030326145554.GI25372 at linen.algonomics.com>
Content-Type: text/plain; charset=us-ascii
[ I know this question is rather off topic, but I don't know of a more
appropriate mailing list. In the end it's about a database with
sequences in, so it's not completely OT :) ]
We (will) have a database that tracks the results of different
experiments in the lab (you could call it a very mini LIMS). It will
most probably be implemented in MySQL.
The users (scientists) want to look at the data in a very 'exploratory'
way (e.g. "given this gene, coming from experiment Y, which other genes
have a similar pattern in experiment X and ... "). They want to query
the data but have no predefined queries whatsoever.
To allow these people to do this, my standard answer is to learn SQL,
have a good look at the database schema and write their queries
themself. They seem however very reluctant to do this.
I can think of alternatives:
- use MS Access and access the database via ODBC (their seems to be a
relatively intuitive way of creating queries/reports in Access)
- dump the tables "tab seperated" and look at them in MS Excell.
- implement something like EnsMart
I would however prefer a (web) environment, which lets them explore the
data. I looked around a bit for this, but couldn't really find anything.
Suggestions, comments ?
Kris,
--
Kris Boulez Tel: +32-9-241.11.00
AlgoNomics NV Fax: +32-9-241.11.02
Technologiepark 4 email:
kris.boulez at algonomics.com
B 9052 Zwijnaarde http://www.algonomics.com/
------------------------------
_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org http://open-bio.org/mailman/listinfo/biosql-l
End of BioSQL-l Digest, Vol 3, Issue 16
***************************************
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the BioSQL-l
mailing list