[EMBOSS] problems installing/using TrEMBL
Fernan Aguero
fernan at iib.unsam.edu.ar
Thu Oct 4 14:08:22 UTC 2007
| On 2 Oct 2007, at 18:54, Fernan Aguero wrote:
|
| > Hi,
| >
| > I've installed TrEMBL in EMBOSS and it seems like I'm having some
| > problems ...
| >
| > I've run dbiflat as follows:
| [snip]
| >
| > Now, when using seqret, it seems like I'm not getting the
| > records I expect, for example if I search for the first ID
| > in the example above (A0B532), I get A0BDZ0 instead:
|
| I suspect your problem is that your trembl file is >2Gb in size.
| Above this size dbiflat won't work properly and will give wacky
| results such as the ones you've shown. This won't be a problem with
| uniprot_sprot.dat as this is still only about 1.1Gb.
|
| Your choices are therefore:
|
| 1) You could split your trembl file into multiple files, each smaller
| than 2Gb. This ends up being a complete pain, and you probably don't
| want to do it this way.
|
| 2) Use the newer dbx* family of indexing programs which can cope with
| larger file sizes. In your case you'd use dbxflat instead of
| dbiflat. There are some configuration differences between the two so
| you should read 'tfm dbxflat' first, but they work pretty much the
| same as the old versions. We use the dbx programs for all of our
| databases and they work fine.
|
| Hope this helps
|
| Simon.
Simon,
thanks for your suggestions. I've been waiting for dbxflat
to finish before replying ... thus the delay.
You mention that there are some configuration
differences between db(x|i)flat ... I guess I've got into those
now ... even after reading tfm for dbxflat, it seems I can't
just set it up right
===> Configuration
DB trembl [
type: P
comment: "TrEMBL 37.0"
method: emblcd
format: embl
dbalias: trembl
dir: /share/bio/emboss/trembl/
file: uniprot_trembl.dat
indexdirectory: /share/bio/emboss/trembl
]
With this configuration, I get this error:
[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences
Warning: Cannot open division file '<null>' for database 'trembl'
Warning: seqCdQry failed
Error: Unable to read sequence 'trembl:A0B532'
Died: seqret terminated: Bad value for '-sequence' and no prompt
If I change the 'method' to 'method: emboss'
as per the example in the dbxflat docs, I get this error:
[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences
EMBOSS An error in ajindex.c at line 3028:
Cannot open param file /share/bio/emboss/trembl/trembl.pxid
This file does not exist (see result of indexing below):
===> Indexing
[root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL
-directory . -filenames uniprot_trembl.dat -release "37.0"
-date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree
indexing for flat file databases
Resource name: embl
Processing file ./uniprot_trembl.dat
[root at alfa trembl]# du -hc *
4.0K dbxflat.command
4.0K trembl.ent
4.0K trembl.pxac
4.0K trembl.pxde
4.0K trembl.pxkw
4.0K trembl.pxsv
4.0K trembl.pxtx
572M trembl.xac
4.2G trembl.xde
381M trembl.xkw
4.0K trembl.xsv
3.0G trembl.xtx
11G uniprot_trembl.dat
19G total
I've also tried other combinations of 'method' (emboss,
emblcd) and 'format' (swiss, embl) without success ...
Am I indexing the db with the right incantation for dbxflat?
If so, what am I missing in my configuration?
Thanks again for any pointer,
Fernan
PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2,
CentOS)
More information about the EMBOSS
mailing list