[EMBOSS] problems installing/using TrEMBL

George Magklaras georgios at biotek.uio.no
Thu Oct 4 14:53:38 UTC 2007


Maybe you are missing the resource record in the emboss.default file for 
the trembl databank and you have passed the wrong arguments to dbxflat. 
  You should choose the emboss method in the DB entry. Then, the 
emboss.default file should contain also a resource entry for trembl:

RES trembl [
    type: Index
    idlen:  15
    acclen: 15
    svlen:  20
    keylen: 30
    deslen: 25
    orglen: 25
]

 From your dbxflat output you quote I can see that the command points to 
the embl resource:

[root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL <--- Why EMBL?
-directory . -filenames uniprot_trembl.dat -release "37.0"
-date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree
indexing for flat file databases
Resource name: embl  <--- That should say trembl, Why did you choose 
embl here?


When the dbxflat command asked you for a resource name, you really 
should have a trembl RES entry and I am not sure that your idformat 
(EMBL) is correct.



GM


-- 
--
George Magklaras

Senior Computer Systems Engineer/UNIX Systems Administrator
EMBnet Technical Management Board
The Biotechnology Centre of Oslo,
University of Oslo
http://www.biotek.uio.no/

EMBnet Norway:	http://www.no.embnet.org/



Fernan Aguero wrote:
>  
> | On 2 Oct 2007, at 18:54, Fernan Aguero wrote:
> | 
> | > Hi,
> | >
> | > I've installed TrEMBL in EMBOSS and it seems like I'm having some
> | > problems ...
> | >
> | > I've run dbiflat as follows:
> | [snip]
> | >
> | > Now, when using seqret, it seems like I'm not getting the
> | > records I expect, for example if I search for the first ID
> | > in the example above (A0B532), I get A0BDZ0 instead:
> | 
> | I suspect your problem is that your trembl file is >2Gb in size.   
> | Above this size dbiflat won't work properly and will give wacky  
> | results such as the ones you've shown.  This won't be a problem with  
> | uniprot_sprot.dat as this is still only about 1.1Gb.
> | 
> | Your choices are therefore:
> | 
> | 1) You could split your trembl file into multiple files, each smaller  
> | than 2Gb.  This ends up being a complete pain, and you probably don't  
> | want to do it this way.
> | 
> | 2) Use the newer dbx* family of indexing programs which can cope with  
> | larger file sizes.  In your case you'd use dbxflat instead of  
> | dbiflat.  There are some configuration differences between the two so  
> | you should read 'tfm dbxflat' first, but they work pretty much the  
> | same as the old versions.  We use the dbx programs for all of our  
> | databases and they work fine.
> | 
> | Hope this helps
> | 
> | Simon.
>  
> Simon,
> 
> thanks for your suggestions. I've been waiting for dbxflat
> to finish before replying ... thus the delay.
> 
> You mention that there are some configuration
> differences between db(x|i)flat  ... I guess I've got into those
> now ... even after reading tfm for dbxflat, it seems I can't
> just set it up right
> 
> ===> Configuration
> DB trembl [
>         type: P
>         comment: "TrEMBL 37.0"
>         method: emblcd
>         format: embl
>         dbalias: trembl
>         dir: /share/bio/emboss/trembl/
>         file: uniprot_trembl.dat
>         indexdirectory: /share/bio/emboss/trembl
> ]
> 
> With this configuration, I get this error:
> [fernan at alfa ~]$ seqret trembl:A0B532
> Reads and writes (returns) sequences
> Warning: Cannot open division file '<null>' for database 'trembl'
> Warning: seqCdQry failed
> Error: Unable to read sequence 'trembl:A0B532'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
> 
> If I change the 'method' to 'method: emboss'
> as per the example in the dbxflat docs, I get this error:
> 
> [fernan at alfa ~]$ seqret trembl:A0B532
> Reads and writes (returns) sequences
> 
>    EMBOSS An error in ajindex.c at line 3028:
> Cannot open param file /share/bio/emboss/trembl/trembl.pxid
> 
> This file does not exist (see result of indexing below):
> 
> ===> Indexing
> [root at alfa trembl]# dbxflat -dbname trembl -idformat EMBL
> -directory . -filenames uniprot_trembl.dat -release "37.0"
> -date "24/07/07" -fields sv,acc,des,key,orgDatabase b+tree
> indexing for flat file databases
> Resource name: embl
> Processing file ./uniprot_trembl.dat
> [root at alfa trembl]# du -hc *
> 4.0K    dbxflat.command
> 4.0K    trembl.ent
> 4.0K    trembl.pxac
> 4.0K    trembl.pxde
> 4.0K    trembl.pxkw
> 4.0K    trembl.pxsv
> 4.0K    trembl.pxtx
> 572M    trembl.xac
> 4.2G    trembl.xde
> 381M    trembl.xkw
> 4.0K    trembl.xsv
> 3.0G    trembl.xtx
> 11G     uniprot_trembl.dat
> 19G     total
> 
> I've also tried other combinations of 'method' (emboss,
> emblcd) and 'format' (swiss, embl) without success ...
> 
> Am I indexing the db with the right incantation for dbxflat?
> If so, what am I missing in my configuration?
> 
> Thanks again for any pointer,
> 
> Fernan
> 
> PS: this is on emboss-4.0.0 running on a Rocks Cluster (4.2,
> CentOS)
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 







More information about the EMBOSS mailing list