[EMBOSS] problems installing/using TrEMBL

Fernan Aguero fernan at iib.unsam.edu.ar
Tue Oct 2 17:54:05 UTC 2007


Hi,

I've installed TrEMBL in EMBOSS and it seems like I'm having some
problems ... 

I've run dbiflat as follows:

dbiflat -dbname trembl -idformat EMBL -directory .
-filenames uniprot_trembl.dat -release '37.0' -date '24/07/07' 
-fields sv,acc,des,key,org

I've put an entry in my emboss.default configuration
file and the db is listed by showdb.

Also the db seems to works fine with, for example
'textsearch':

[fernan at alfa ~]$ textsearch trembl:* 'cyclase'
Search sequence documentation. Slow, use SRS and Entrez!
Output file [a0b532_mettp.textsearch]: stdout
# Search for: cyclase
trembl-id:A0B532_METTP  A0B532_METTP  A0B532	RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A1RWP7_THEPD  A1RWP7_THEPD  A1RWP7	RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A2SR85_METLZ  A2SR85_METLZ  A2SR85    Cyclase family protein.
trembl-id:A3H5Q9_9CREN  A3H5Q9_9CREN  A3H5Q9	Magnesium-protoporphyrin IX monomethyl ester (Oxidative) cyclase (EC 1.14.13.81).
trembl-id:A3H7Y6_9CREN  A3H7Y6_9CREN  A3H7Y6	RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A6URB1_METVA  A6URB1_METVA  A6URB1    Cyclase family protein.
...

First, I've got a number of warnings when running dbiflat.
Because all of them were about null IDs ('') I've just
ignored them ... I mention it just in case,
Warning: Duplicate ID skipped: '' All hits will point to first ID found

Now, when using seqret, it seems like I'm not getting the
records I expect, for example if I search for the first ID
in the example above (A0B532), I get A0BDZ0 instead:

[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences
output sequence(s) [a0bdz0_parte.fasta]: stdout
>A0BDZ0_PARTE A0BDZ0 Chromosome undetermined scaffold_101, whole genome shotgun sequence.
MLNFPQNARDHFSCDCDPCEFAITHGEEIMPKRVPPQKPIQQIQDKDLGLLLRKLQAPNK
LTRSVRIRIPETCVCNEGEIKFIAYYDESEGFIKFIQKPTFQQTKQFLNERRPPDSLAVI
IKYIDNNMQVMTDMEFTILMMKRKIDPIWSQILYIQNFNSNKNYELQHYEFKHSFDSKYP
EFDLARIEILILNGEIARASSDFVPMVREEAYENSLSQDQYCRYMVYKMVHYADVFGGIQ
ITEGKFSFHKKTFISMEKMEYTDLDRKALFDSEILLRKKKMIDEDMFQFQKLIDQNVKKE
REYALKVYREILDMDNGLDQQSHLLKNKLSVIGYDLKKYSQSIQSNFQQVMVSKDPASTL
KELVIEQKVNEEKLTSILKPKKGEKTKKKM

But if I search for A0B532_METTP I get nothing:
[fernan at alfa ~]$ seqret trembl:A0B532_METTP
Reads and writes (returns) sequences
Error: Unable to read sequence 'trembl:A0B532_METTP'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Now, if I search for A0BDZ0, I get A0BL81 instead:

[fernan at alfa ~]$ seqret trembl:A0BDZ0
Reads and writes (returns) sequences
output sequence(s) [a0bl81_parte.fasta]: stdout
>A0BL81_PARTE A0BL81 Chromosome undetermined scaffold_113, whole genome shotgun sequence.
MKQISESAHILQKVYNPNRMNKLFMTTHYQLQNETDLIFDKYMLMPLFGLSVANGISSNC
IKPKYLCSEYKKQELYDCNLILILSAYSDQAVYRSKTMYEKRNGLEQIFKYLASPNYTYN
IHISLLSYFVPQRVFYKQVLQALNIFELIDQKQIEELTKSSSIINQSVGEDNLDSILFKN
QEFIDYQKWRRMLKNNTIINLKTLHQHQLSQQIFCQYFLRYHYYQGCEEEINKLNKFLVD
DFDMFKFRSRLEHNEKKMKFYFLRMLKYFKLNEKLEIFLKFSFKSYSLDWNKELLREMKN
SLNQYKKQ

Any idea about what is wrong? I also have swissprot
installed (pretty much in the same way) and it works OK with
seqret, both using ACs (Q4U9M9) or IDs (104K_THEAN).

This is on a Linux cluster (Rocks 4.2, with EMBOSS installed from the
Bio roll)

[fernan at alfa ~]$ embossversion 
Writes the current EMBOSS version number
4.0.0

Thanks in advance for any pointer,

Fernan




More information about the EMBOSS mailing list