[Biojava-l] Elapsed time of feature filtering

Y D Sun Yudong.Sun at newcastle.ac.uk
Tue Jun 10 12:33:32 EDT 2003


Hi all,

The problem is solved. I intall biosqldb-pg.sql only to a database and
the performance is tremendously enhanced. Now it takes 5 minutes (10
hours before) to add a sequence like BA000040 and 15 seconds (40 seconds
before) to filter CDS. The time is not highly affected by the number of
sequences in the database. That means other two schemas deteriorate the
execution. So, the tutorial at
http://www.biojava.org/tutorials/biosql.html needs modification, not
suggesting users to install biosqldb-pg.sql only.

Thanks all for your help.

George

> -----Original Message-----
> From: Thomas Down [mailto:thomas at derkholm.net] 
> Sent: 10 June 2003 10:31
> To: Y D Sun
> Cc: Thomas Down; biojava-l at biojava.org
> Subject: Re: [Biojava-l] Elapsed time of feature filtering
> 
> 
> Once upon a time, Y D Sun wrote:
> > 
> > I would like to clarify one important point. Is 
> biosqldb-pg.sql (you 
> > sent to me) the ONLY schema required to install in a 
> database? Other 
> > two schemas, i.e., biosqldb-assembly-pg.sql and 
> > biosql-accelerators-pg.sql, are not required to install.
> 
> Yes, biosqldb-pg.sql is the only schema that's required.  The 
> two other files are optional: biosqldb-assembly-pg.sql adds 
> support for Ensembl-style assembled sequences in the database 
> (not relevant to you), and biosql-accelerators-pg.sql 
> contains some stored procedures which are used to optimize 
> certain write operations.  If in doubt, start with just the 
> core schema. The accelerators are probably useful to you, the 
> assembly support (which was only really a proposal, and 
> hasn't been widely used) probably isn't, but won't actually 
> do any harm.
> 
> As David has pointed out, neither of these files are 
> compatible with the Singapore schema, as used by the CVS HEAD 
> of BioJava. However, you're using BioJava 1.3, and the Cape 
> Town schema, which should be fine.
> 
> > I would also like to know the PostgreSQL version and OS you 
> are using.
> 
> PostgreSQL 7.3.2 (I ought to upgrade to 7.3.3, but I can't
> see this making a difference in this case).
> 
> Linux kernel 2.4.18 with some RedHat patches.
> 
> BioJava compiled from release-1_3-branch (should be identical 
> to 1.3pre4 -- certainly the BioSQL code is the same)
> 
> I've used BioSQL without any trouble on a range of PostgreSQL 
> versions (7.1-7.3) on a range of different Linux machines, 
> without any problems.  I've also used in with MySQL on 
> alpha/Tru64 boxes.
> 
> > How long it takes to add a sequence to the database in your 
> case? For 
> > me, it takes 10 hours to insert a BA000040 sequence to DB.
> 
> Around 10 minutes for me (again, on my laptop -- the 
> limitations are memory and disk speed).  A modern desktop 
> machine, or any dedicated server built in the last 5 years, 
> really ought to do better than this.
> 
> What does `top' output look like while you're inserting 
> sequences? It's a fairly memory-intensive process, so if 
> you're short of RAM things might get nasty. (ideally, it 
> should be possble to load sequences into BioSQL using the 
> event-based model of sequence I/O, which would mean much less 
> memory usage and probably better performance too.  The 
> current scheme, which requires loading the whole sequence 
> into memory first, hasn't been a problem so far, though).
> 
>     Thomas. 
> 
> 



More information about the Biojava-l mailing list