EMBL indexing strategy

Jack Leunissen j.leunissen at cmbi.kun.nl
Mon Jun 23 15:58:13 UTC 2003


I index the whole lot in one go. This is what the files look like:

-rw-r--r--    1 jackl    geninf   479077377 Jun 18 02:22 entrynam.idx
-rw-r--r--    1 jackl    geninf   403886332 Jun 18 02:33 acnum.trg
-rw-r--r--    1 jackl    geninf   101085912 Jun 18 02:33 acnum.hit

So about 1GB in all. Not too bad, considering that the flatfiles are
120 GB, and the SRS indices amount to 37 GB.

Cheers,
Jack


> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk 
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Aengus Stewart
> Sent: Monday, June 23, 2003 12:44 PM
> To: emboss at embnet.org
> Subject: EMBL indexing strategy
> 
> 
> 
> I am currently building the EMBL indexes and I was just wondering how
> other people organise them.
> 
> I have no idea as yet what size entrynam.idx will end up being but I
> imagine the word "humungous" will apply.
> 
> Do people index the sections ( EST, GSS, HUM, etc ) separately and
> follow Simon Andrews method of poviding an EMBL that is a composite of
> these sections or follow the lets index the bloody lot at one 
> go route?
> 
> Will there be any difference in the outcome either in admin 
> terms or for
> the user?
> 
> 
> Regards
> Aengus
> 
> 
> -- 
> --------------------------------------------------------------
> --------------
> Aengus Stewart
> Computational Genome Analysis Laboratory  Tel: +44 (0)20 7269 3679
> Cancer Research UK, Lincoln's Inn Fields, Holborn, London, 
> WC2A 3PX, UK
> --------------------------------------------------------------
> --------------
> 




More information about the EMBOSS mailing list