[EMBOSS] problems installing/using TrEMBL

Fernan Aguero fernan at iib.unsam.edu.ar
Thu Oct 4 14:08:22 UTC 2007


 
| On 2 Oct 2007, at 18:54, Fernan Aguero wrote:
| 
| > Hi,
| >
| > I've installed TrEMBL in EMBOSS and it seems like I'm having some
| > problems ...
| >
| > I've run dbiflat as follows:
| [snip]
| >
| > Now, when using seqret, it seems like I'm not getting the
| > records I expect, for example if I search for the first ID
| > in the example above (A0B532), I get A0BDZ0 instead:
| 
| I suspect your problem is that your trembl file is >2Gb in size.   
| Above this size dbiflat won't work properly and will give wacky  
| results such as the ones you've shown.  This won't be a problem with  
| uniprot_sprot.dat as this is still only about 1.1Gb.
| 
| Your choices are therefore:
| 
| 1) You could split your trembl file into multiple files, each smaller  
| than 2Gb.  This ends up being a complete pain, and you probably don't  
| want to do it this way.
| 
| 2) Use the newer dbx* family of indexing programs which can cope with  
| larger file sizes.  In your case you'd use dbxflat instead of  
| dbiflat.  There are some configuration differences between the two so  
| you should read 'tfm dbxflat' first, but they work pretty much the  
| same as the old versions.  We use the dbx programs for all of our  
| databases and they work fine.
| 
| Hope this helps
| 
| Simon.
 
Simon,

thanks for your suggestions. I've been waiting for dbxflat
to finish before replying ... thus the delay.

You mention that there are some configuration
differences between db(x|i)flat  ... I guess I've got into those
now ... even after reading tfm for dbxflat, it seems I can't
just set it up right

===> Configuration
DB trembl [
        type: P
        comment: "TrEMBL 37.0"
        method: emblcd
        format: embl
        dbalias: trembl
        dir: /share/bio/emboss/trembl/
        file: uniprot_trembl.dat
        indexdirectory: /share/bio/emboss/trembl
]

With this configuration, I get this error:
[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences
Warning: Cannot open division file '<null>' for database 'trembl'
Warning: seqCdQry failed
Error: Unable to read sequence 'trembl:A0B532'
Died: seqret terminated: Bad value for '-sequence' and no prompt

If I change the 'method' to 'method: emboss'
as per the example in the dbxflat docs, I get this error:

[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences

   EMBOSS An error in ajindex.c at line 3028:
Cannot open param file /share/bio/emboss/trembl/trembl.pxid

This file does not exist (see result of indexing below):

===> Indexing
[root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL
-directory . -filenames uniprot_trembl.dat -release "37.0"
-date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree
indexing for flat file databases
Resource name: embl
Processing file ./uniprot_trembl.dat
[root at alfa trembl]# du -hc *
4.0K    dbxflat.command
4.0K    trembl.ent
4.0K    trembl.pxac
4.0K    trembl.pxde
4.0K    trembl.pxkw
4.0K    trembl.pxsv
4.0K    trembl.pxtx
572M    trembl.xac
4.2G    trembl.xde
381M    trembl.xkw
4.0K    trembl.xsv
3.0G    trembl.xtx
11G     uniprot_trembl.dat
19G     total

I've also tried other combinations of 'method' (emboss,
emblcd) and 'format' (swiss, embl) without success ...

Am I indexing the db with the right incantation for dbxflat?
If so, what am I missing in my configuration?

Thanks again for any pointer,

Fernan

PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2,
CentOS)




More information about the EMBOSS mailing list