[BioSQL-l] RE: Exploring data in database (Kris Boulez)

McCulloch, Alan alan.mcculloch at agresearch.co.nz
Fri Mar 28 10:21:38 EST 2003


hi ,

regarding the question below about browsing a database, here are a few
additions to the 
responses so far

1) we are using an Oracle database. Some of my suggestions below
    are not applicable to mysql and probably others - e.g. developing an

    API of  stored functions. Therefore I agree with the previous
respondents on the need to 
    get a serious database. 

2) something we have found useful in addition to views, is a
comprehensive API of stored 
    functions.  This has the advantage that simple SQL queries using
stored functions can replace 
    horrendous or even  impossible queries involving table joins. Quite
powerful queries can then 
    be built up on top of the stored function API pretty easily. You can
write something like....
   
       select 
           dbutils.ourlink(sequencename) ourlink,

 
dbutils.dblink(dbutils.getTopDBHit(sequencename,'SWISSPROT','hitid'),'SW
ISSPROT') swisslink,
           dbutils.getTopDBHit(sequencename,'SWISSPROT','description')
swissdescription,
           dbutils.ourlink(dbutils.getContigName(sequencename))
contiglink,
           dbutils.getBlastLink(sequencename) blastbutton,
           dbutils.getExpressionClusterName(sequencename)
expressionCluster
      from
           ourseqdb
      where
           library = 'somelib'


     - in the above there are examples of stored functions to make
hyperlinks to a resource page 
     for a seq ( from which can click through to tracefile and other
material), pull out the best hit to 
     some database (like Swissprot) and make a hyperlink out of it, pull
out the description of the 
     swissprot hit , pull out which contig the seq belongs to (and
hyperlink that) , 
     pull out which expression cluster, return a hyperlink that will
take some action like blast the sequence
     against some database, etc.

     A single query like this probably involves a fairly large number of
tables in a normalised
     database , and could probably not be achieved by using table joins
or views. 
     
     We have also found this approach gives very good performance.
Building queries on top
     of views can run into performance problems once you start joining
views.

     By developing such an API in the database server, as opposed to in
applications, it is available
     to all clients - ACCESS, or whatever else. Also , because these run
inside the database,
     they are alot faster than if something like this was done at the
application level.

     It greatly helps developers who can build more complex reports on
top of the API , and 
     also makes the database a more accessible to (the very few !)
end-users who may be willing
     to enter SQL code

3) We started off distributing ACCESS applications with a bunch of
linked tables
    and canned queries as per your first option but wanted to move to a
web environment.

    We therefore set up a (Java based) middleware server that stands
between web page forms
    that contain (usually hidden) SQL code, and the database. Web forms
are submitted , the middleware
    gets the details from the form, that include SQL to be executed and
a bunch of user-set variables, 
    excutes it and sends back results.

    Over time you can build quite alot of useful smarts into a
middleware server like this , for 
    example


       * result formatting - the middleware can format each row returned
from the 
         database before sending back to client. Then any query
submitted via the 
         middleware can take advantage of the formatting service. Our
server supports
         Fixed width, CSV, FASTA, XML and GCG list file formats
currently. So you can 
         do quite nice things like execute a SQL query and have the
results returned 
         as a FASTA extract directly.
         
       * SQL code binding/modification. For example you can do something
like , have a form
         with some embedded code "select * from :tablename" , and then
have a drop
         down list where a user selects a table. The middleware binds
the :tablename 
         reference from the field submitted with the form. (Maybe this
isn't a very good example
         ! but you get the idea). Can get some quite flexible web-form
reports very easily by 
         using this code-binding approach. Can bind clauses, variable
lists , predicates, or whatever
         else.

         Also , you can get the middleware to do some list-handling. For
example you can have
         a text area on a form into which the user pastes a list of
sequencenames or whatever, 
         and the middleware can get that and bind it to an "in" clause
....."where seqid in 
         ('aaaa','bbbb','cccc','dddd','eeee')" 

         (And also of course the usual variable-binding "select * from
sometable where seqid = :seqid")
        

       * optimise throughput. Many queries are exploratory , and a user
may only want to see
         the first 10 sequences that match, and not have to wait for the
complete 
         recordset to be returned. The middleware can keep a record
count and kill a query after a 
         certain number of rows have been returned - the default might
be say 10, but the 
         user can change this because the limit is set in a field on the
form submitted.

       * implement a time-out protocol. A problem with queries via
browser is the browser  time-out.
         The middleware can set a watchdog that wakes up before the
browser would time out,
         and send a response back to the server "get your results from
X" , and then redirect
         the results to "X".

       * authentication and security

     This might sound a bit ambitious, but in fact it is very easy to
set up a simple middleware
     server in Java , and then you can add funtionality as you go.



Hope this is useful

Cheers

Alan McCulloch
Bioinformatics Software Engineer
AgResearch NZ


Message: 1
Date: Wed, 26 Mar 2003 15:55:54 +0100
From: Kris Boulez <kris.boulez at algonomics.com>
Subject: [BioSQL-l] Exploring data in database
To: biosql-l at open-bio.org
Message-ID: <20030326145554.GI25372 at linen.algonomics.com>
Content-Type: text/plain; charset=us-ascii

[ I know this question is rather off topic, but I don't know of a more
appropriate mailing list. In the end it's about a database with
sequences in, so it's not completely OT :)    ]


We (will) have a database that tracks the results of different
experiments in the lab (you could call it a very mini LIMS). It will
most probably be implemented in MySQL.

The users (scientists) want to look at the data in a very 'exploratory'
way (e.g. "given this gene, coming from experiment Y, which other genes
have a similar pattern in experiment X and ... "). They want to query
the data but have no predefined queries whatsoever.

To allow these people to do this, my standard answer is to learn SQL,
have a good look at the database schema and write their queries
themself. They seem however very reluctant to do this.

I can think of alternatives:

- use MS Access and access the database via ODBC (their seems to be a
  relatively intuitive way of creating queries/reports in Access)

- dump the tables "tab seperated" and look at them in MS Excell.

- implement something like EnsMart

I would however prefer a (web) environment, which lets them explore the
data. I looked around a bit for this, but couldn't really find anything.


Suggestions, comments ?


Kris,
-- 
Kris Boulez 				Tel: +32-9-241.11.00
AlgoNomics NV 				Fax: +32-9-241.11.02
Technologiepark 4 			email:
kris.boulez at algonomics.com
B 9052 Zwijnaarde 			http://www.algonomics.com/

------------------------------

_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org http://open-bio.org/mailman/listinfo/biosql-l


End of BioSQL-l Digest, Vol 3, Issue 16
***************************************


    


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the BioSQL-l mailing list