EMBL indexing strategy
Jack Leunissen
j.leunissen at cmbi.kun.nl
Mon Jun 23 15:58:13 UTC 2003
I index the whole lot in one go. This is what the files look like:
-rw-r--r-- 1 jackl geninf 479077377 Jun 18 02:22 entrynam.idx
-rw-r--r-- 1 jackl geninf 403886332 Jun 18 02:33 acnum.trg
-rw-r--r-- 1 jackl geninf 101085912 Jun 18 02:33 acnum.hit
So about 1GB in all. Not too bad, considering that the flatfiles are
120 GB, and the SRS indices amount to 37 GB.
Cheers,
Jack
> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of Aengus Stewart
> Sent: Monday, June 23, 2003 12:44 PM
> To: emboss at embnet.org
> Subject: EMBL indexing strategy
>
>
>
> I am currently building the EMBL indexes and I was just wondering how
> other people organise them.
>
> I have no idea as yet what size entrynam.idx will end up being but I
> imagine the word "humungous" will apply.
>
> Do people index the sections ( EST, GSS, HUM, etc ) separately and
> follow Simon Andrews method of poviding an EMBL that is a composite of
> these sections or follow the lets index the bloody lot at one
> go route?
>
> Will there be any difference in the outcome either in admin
> terms or for
> the user?
>
>
> Regards
> Aengus
>
>
> --
> --------------------------------------------------------------
> --------------
> Aengus Stewart
> Computational Genome Analysis Laboratory Tel: +44 (0)20 7269 3679
> Cancer Research UK, Lincoln's Inn Fields, Holborn, London,
> WC2A 3PX, UK
> --------------------------------------------------------------
> --------------
>
More information about the EMBOSS
mailing list