[Biojava-l] Elapsed time of feature filtering

Tue Jun 10 11:31:09 EDT 2003

Once upon a time, Y D Sun wrote:
> 
> I would like to clarify one important point. Is biosqldb-pg.sql (you
> sent to me) the ONLY schema required to install in a database? Other two
> schemas, i.e., biosqldb-assembly-pg.sql and biosql-accelerators-pg.sql,
> are not required to install.

Yes, biosqldb-pg.sql is the only schema that's required.  The
two other files are optional: biosqldb-assembly-pg.sql adds
support for Ensembl-style assembled sequences in the database
(not relevant to you), and biosql-accelerators-pg.sql contains
some stored procedures which are used to optimize certain write
operations.  If in doubt, start with just the core schema.
The accelerators are probably useful to you, the assembly
support (which was only really a proposal, and hasn't been
widely used) probably isn't, but won't actually do any harm.

As David has pointed out, neither of these files are compatible
with the Singapore schema, as used by the CVS HEAD of BioJava.
However, you're using BioJava 1.3, and the Cape Town schema,
which should be fine.

> I would also like to know the PostgreSQL version and OS you are using. 

PostgreSQL 7.3.2 (I ought to upgrade to 7.3.3, but I can't
see this making a difference in this case).

Linux kernel 2.4.18 with some RedHat patches.

BioJava compiled from release-1_3-branch (should be identical
to 1.3pre4 -- certainly the BioSQL code is the same)

I've used BioSQL without any trouble on a range of PostgreSQL
versions (7.1-7.3) on a range of different Linux machines, without
any problems.  I've also used in with MySQL on alpha/Tru64 boxes.

> How long it takes to add a sequence to the database in your case? For
> me, it takes 10 hours to insert a BA000040 sequence to DB. 

Around 10 minutes for me (again, on my laptop -- the limitations
are memory and disk speed).  A modern desktop machine, or any
dedicated server built in the last 5 years, really ought to do
better than this.

What does `top' output look like while you're inserting sequences?
It's a fairly memory-intensive process, so if you're short of RAM
things might get nasty. (ideally, it should be possble to load
sequences into BioSQL using the event-based model of sequence
I/O, which would mean much less memory usage and probably better
performance too.  The current scheme, which requires loading the
whole sequence into memory first, hasn't been a problem so far,
though).

    Thomas.