[EMBOSS] problems installing/using TrEMBL
Fernan Aguero
fernan at iib.unsam.edu.ar
Tue Oct 2 17:54:05 UTC 2007
Hi,
I've installed TrEMBL in EMBOSS and it seems like I'm having some
problems ...
I've run dbiflat as follows:
dbiflat -dbname trembl -idformat EMBL -directory .
-filenames uniprot_trembl.dat -release '37.0' -date '24/07/07'
-fields sv,acc,des,key,org
I've put an entry in my emboss.default configuration
file and the db is listed by showdb.
Also the db seems to works fine with, for example
'textsearch':
[fernan at alfa ~]$ textsearch trembl:* 'cyclase'
Search sequence documentation. Slow, use SRS and Entrez!
Output file [a0b532_mettp.textsearch]: stdout
# Search for: cyclase
trembl-id:A0B532_METTP A0B532_METTP A0B532 RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A1RWP7_THEPD A1RWP7_THEPD A1RWP7 RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A2SR85_METLZ A2SR85_METLZ A2SR85 Cyclase family protein.
trembl-id:A3H5Q9_9CREN A3H5Q9_9CREN A3H5Q9 Magnesium-protoporphyrin IX monomethyl ester (Oxidative) cyclase (EC 1.14.13.81).
trembl-id:A3H7Y6_9CREN A3H7Y6_9CREN A3H7Y6 RNA-3'-phosphate cyclase (EC 6.5.1.4).
trembl-id:A6URB1_METVA A6URB1_METVA A6URB1 Cyclase family protein.
...
First, I've got a number of warnings when running dbiflat.
Because all of them were about null IDs ('') I've just
ignored them ... I mention it just in case,
Warning: Duplicate ID skipped: '' All hits will point to first ID found
Now, when using seqret, it seems like I'm not getting the
records I expect, for example if I search for the first ID
in the example above (A0B532), I get A0BDZ0 instead:
[fernan at alfa ~]$ seqret trembl:A0B532
Reads and writes (returns) sequences
output sequence(s) [a0bdz0_parte.fasta]: stdout
>A0BDZ0_PARTE A0BDZ0 Chromosome undetermined scaffold_101, whole genome shotgun sequence.
MLNFPQNARDHFSCDCDPCEFAITHGEEIMPKRVPPQKPIQQIQDKDLGLLLRKLQAPNK
LTRSVRIRIPETCVCNEGEIKFIAYYDESEGFIKFIQKPTFQQTKQFLNERRPPDSLAVI
IKYIDNNMQVMTDMEFTILMMKRKIDPIWSQILYIQNFNSNKNYELQHYEFKHSFDSKYP
EFDLARIEILILNGEIARASSDFVPMVREEAYENSLSQDQYCRYMVYKMVHYADVFGGIQ
ITEGKFSFHKKTFISMEKMEYTDLDRKALFDSEILLRKKKMIDEDMFQFQKLIDQNVKKE
REYALKVYREILDMDNGLDQQSHLLKNKLSVIGYDLKKYSQSIQSNFQQVMVSKDPASTL
KELVIEQKVNEEKLTSILKPKKGEKTKKKM
But if I search for A0B532_METTP I get nothing:
[fernan at alfa ~]$ seqret trembl:A0B532_METTP
Reads and writes (returns) sequences
Error: Unable to read sequence 'trembl:A0B532_METTP'
Died: seqret terminated: Bad value for '-sequence' and no prompt
Now, if I search for A0BDZ0, I get A0BL81 instead:
[fernan at alfa ~]$ seqret trembl:A0BDZ0
Reads and writes (returns) sequences
output sequence(s) [a0bl81_parte.fasta]: stdout
>A0BL81_PARTE A0BL81 Chromosome undetermined scaffold_113, whole genome shotgun sequence.
MKQISESAHILQKVYNPNRMNKLFMTTHYQLQNETDLIFDKYMLMPLFGLSVANGISSNC
IKPKYLCSEYKKQELYDCNLILILSAYSDQAVYRSKTMYEKRNGLEQIFKYLASPNYTYN
IHISLLSYFVPQRVFYKQVLQALNIFELIDQKQIEELTKSSSIINQSVGEDNLDSILFKN
QEFIDYQKWRRMLKNNTIINLKTLHQHQLSQQIFCQYFLRYHYYQGCEEEINKLNKFLVD
DFDMFKFRSRLEHNEKKMKFYFLRMLKYFKLNEKLEIFLKFSFKSYSLDWNKELLREMKN
SLNQYKKQ
Any idea about what is wrong? I also have swissprot
installed (pretty much in the same way) and it works OK with
seqret, both using ACs (Q4U9M9) or IDs (104K_THEAN).
This is on a Linux cluster (Rocks 4.2, with EMBOSS installed from the
Bio roll)
[fernan at alfa ~]$ embossversion
Writes the current EMBOSS version number
4.0.0
Thanks in advance for any pointer,
Fernan
More information about the EMBOSS
mailing list