[BioSQL-l] questions

Hilmar Lapp hlapp at gnf.org
Tue Feb 1 12:52:13 EST 2005


On Thursday, January 27, 2005, at 01:25  PM, Tamas Hegedus wrote:

> -----------------------------
> DOCUMENTATION; WHYs; packing
> -----------------------------
> However, I know that BioSQL is under development, but it is not a 
> 'theoretical' projects, intend to be used by users.
> More users, more feedback, more development, happier programmer.
> But at this moment very difficult to recognize its advantages:
> => Why is BioSQL (RDBMS) better than other solution (e.g. flat files); 
> why should I use it for my project?

I summarized the most popular use cases at BOSC03. You may want to 
check out
http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf

> => What to download, from where to download? (In my opinion CVS is 
> definitely for programmers not for biologist.)
> => What programs can I use to access the data? Only scripting? No! I 
> can  use it e.g. with GBrowse, exactly what I need...

There is an adaptor that bridges biosql to be used by gbrowse, there's 
just been a thread on this on the gbrowse mailing list. There's 
possibly some wrinkles though that need to be worked out so that 
Gbrowse finds the features it is supposed to find. Check out the last 
week in the gbrowse mailing list archive.

> => Is there any convinient way to query the database? I do not really 
> want to learn SQL. How to perform and link queries/returned entries to 
> 'conventional' analysis tools (like pattern search)?

Bioperl-db provides you with an interface (object-relational mapper) 
that lets you interact with the database through bioperl and query 
objects, not SQL. When you run a search ($adaptor->find_by_XXX) you'll 
get bioperl objects returned.

Note though that ultimately SQL is always going to be so much more 
powerful.

> ----
> For developers: if you work on BioSQL constantly (from the beginning), 
> you will know what column is for what (like 'Rank'), what are the role 
> in a specific relation; but you can find  out these things, if you 
> populate the database, and dig into it: so much energy needed that the 
> developer find out an easier way to solve his problem.
> ----

Have you checked out the doc directory in the repository? There is a 
schema overview and an ERD. Those two will still leave many questions 
open, like 'rank', but it could be a start nonetheless.

> I know that it is a huge work to create (and keep uptodate) a website. 
> Personally I really do not like (hate) creating web-pages. But I think 
> a web-site for BioSQL would greatly accelerate the BioSQL project.

Unfortunately, almost all developers have the same enthusiasm for 
creating web-pages as you have. Volunteering web-page authors is what 
the OBF needs most desperately. I have no doubt that informative, 
well-organized, and most importantly regularly updated web pages would 
help the biosql project, but this is also the area where people could 
volunteer most easily.

Biosql like all other OBF (and generally open-source) projects is a 
project built by people who volunteer their time ...

>
> -----------------------------
> SCHEMA; RDBMSs
> -----------------------------
> I may rise the next question, since I do not see the deepness of 
> BioSQL, I know only PgSQL and MySQL.
>
> Why do you have different schema for different database servers 
> (PgSQL, hsqldb, Oracle)? I guess why for MySQL...
> Would not be possible to manage only one schema? It could free 
> energies/time for other things.

This is how it worked initially. When the schema evolved it became a 
pain for two reasons. First, the schema translator wasn't capable of 
the full SQL standard and bringing it up to requirements was more work 
than maintaining multiple versions of the schema, and second, the MySQL 
version used to be used as the reference, which is bad because in MySQL 
you can express only a small subset of the DDL capabilities that you 
have in other RDBMSs.

Note that biosql is meant to be a very stable schema. There may be 
changes in the future, but not at a rapid pace. It's not been a big 
time sink to maintain 3 versions since the Singapore changes. This is 
not a programming library - it's a schema.

> [...]
>
> ------------------------------
> PYTHON
> ------------------------------
> I prefer python over perl (e.g. because of this I had extra struggles 
> to install BioSQL with SwissProt).
>
> If I would know the object-mapping I think I could write a python 
> script to load the the SwissProt into the BioSQL (it should be easy 
> and straightforward).

You may want to talk to the guys on the biopython list on how they 
stored sequences in biosql.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list