[Biojava-l] Different implementation of Sequence?

Fri Jun 6 15:56:51 EDT 2003

Hi Thomas,

I download the Embl files from

http://www.ebi.ac.uk/cgi-bin/genomes/genomes.cgi?genomes=Bacteria

In the list, for example, No. 11 (BA000040) and 88 (BA000030) are large
files.

I find a problem in my PostgreSQL installation. I didn't build JDBC
driver in it. Is this the cause of poor performance? I will reinstall
it. However, BioSQL code does work without this build, slowly of course.

George

> -----Original Message-----
> From: Thomas Down [mailto:td2 at sanger.ac.uk] 
> Sent: 06 June 2003 14:33
> To: Y D Sun
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Different implementation of Sequence?
> 
> 
> On Fri, Jun 06, 2003 at 10:33:28AM +0100, Y D Sun wrote:
> > 
> > > -----Original Message-----
> > > From: Thomas Down [mailto:thomas at derkholm.net]
> > > Sent: 05 June 2003 22:58
> > > To: smh1008 at cus.cam.ac.uk
> > > Cc: Y D Sun; Thomas Down; biojava-l at biojava.org
> > > Subject: Re: [Biojava-l] Different implementation of Sequence?
> > > 
> > > 
> > > Once upon a time, David Huen wrote:
> > > > On Thursday 05 Jun 2003 6:07 pm, Y D Sun wrote:
> > > > > Having created the indices as following and restarted 
> > > > > postmaster,
> > > > > the performance of feature filtering is even worse. Maybe 
> > > MySQL is a
> > > > > better choice than PostgreSQL. Does anyone have the similar
> > > > > experience?
> > > > >
> > > > Was the access code exactly as you depicted it? ie. only
> > > filtering on
> > > > "CDS".
> > > > Also what was the dataset you searched?  was it the same
> > > dataset in both
> > > > EMBL flat file and BioSQL?  What is your version of
> > > postgresql and what was
> > > > the platform?
> > > > 
> > 
> > Yes, that is the exact code I use to filter CDS on one 
> sequence. The 
> > same code is used for the same sequence loaded from Embl file and 
> > PostgreSQL database. The execution times (for one sequence only) in 
> > two cases are highly diverse .
> > 
> > I installed PostgreSQL 7.3.2 on Linux 2.4.20.
> > 
> > The database contains 10 complete bacterial sequences with 
> length from 
> > 2M to 9M (Embl file size 9M to 18M).
> 
> Hmmm...
> 
> That amount of data definitely ought to be handled without
> any big problems.  On the other hand, it's enough that if
> a lot of the database accesses we're going for indices, it 
> could plausibly take a minute or so.
> 
> I've attached a new schema file which is compatible with the 
> one on the website, but has some extra CREATE INDEX 
> statements. Could you try again with that.  If that doesn't 
> help, it might be worth trying MySQL to get a different datapoint.
> 
> Finally, could you point me to one of the EMBL files you use 
> (preferably the biggest one), and I'll do some testing at some point.
> 
>     Thomas.
>