[DAS] ldas server limits

Ewan Birney birney@ebi.ac.uk
Mon, 24 Jun 2002 21:13:32 +0100 (BST)


On Mon, 24 Jun 2002, Russell, Archie wrote:

> 
> On Fri, 21 Jun 2002, Tony Cox wrote:
> 
> +>On Fri, 21 Jun 2002, Lincoln Stein wrote:
> +>
> +>OK, lets try it. I've just kicked off a database load - we'll see what
> +>happens... :)
> +>
> 
> >Well, I finally got a bit bored of waiting for the DB to load so I killed
> it off
> >at 41,278,000 records. That's just under 50% of the total data.
> 
> Could you tell us how long this took to download?  Would this be a
> reasonable approach to periodically obtaining all of a database like ensembl
> in a common format (ie DAS)?



Tony was talking about slurping tab delimited text into a mysql database
serverside (ie, just the time it takes to spin the disks on
server!). Sucking the brains out of Ensembl via DAS is notionally possible
but discouraged and probably wont-scale-well-at-all. Ensembl is 20GB of
data inside Mysql, and likely to be much much larger over the wire in DAS.



You can get Ensembl as standard EMBL/GenBank flat files, and they are now
structures as "slices down the genome" not "clones" which makes genome
orientated parsing easier.


Probably what you will find is that a new Ensembl "product" called
"Mart" which will be in the next release is ideal for you. Mart is query
optimised datase in a "data warehouse" or probably more correctly "data
mart" mode. The schema has been designed for easy querying and easy data
orientated exploration (honestly, "show tables" and "describe tables" is
all you need). By deliberately distributing the SQL form we can make
it your choice of how to slice-and-dice the data.




All the tests of Mart so far have had people drooling, so see if it works
for you ;). You will be able to access Mart by

  (a) incredibly intuitive web access (all thanks to Will Spooner) giving
tab delimited files or excel spread sheets (whole genome downloads
probably forbidden, but whole chromosome definitely ok)

  (b) remote mysql to kaka.sanger.ac.uk

  (c) tab delimited data download ---> your own mysql server



This is due out sometime over the next 10 days or so (one of the Ensembl
web team has a better sense of the schedule) with NCBI 29.





> 
> thanks,
> archie
> 
> archie russell
> 425-636-6312
> 
> 
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.
> 
> ==============================================================================
> 
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------