[EMBOSS] seqret segfault (refseq protein sequence, indexed with dbxflat)
Jan T Kim
jttkim at googlemail.com
Thu Feb 28 16:14:00 UTC 2013
Dear All,
I've run into a weird problem with seqret after downloading the complete
protein refseq database and indexing that with dbxflat. The problem
seems to be triggered by a rare condition, so far I've only encountered
it with accession ZP_10312765:
% seqret -feature -outseq=stdout -osformat=swiss ptest:ZP_10312765
Monitoring the seqret process using top, I noticed that the process
grows to a size of 2g before segfaulting.
Trying the same with ZP_10312766, the next record in the file, causes
no problem. Also, -osformat=fasta and -osformat=genbank work with
ZP_10312765, so the problem seems to be with outputting the swiss format.
This happens on an Ubuntu 12.10 system (Linux 3.5.0-22-generic x86_64,
emboss 6.4.0-4) and a Debian wheezy system (Linux 3.2.35-2 x86_64,
emboss 6.4.0-2) as well.
I attach a file containing the problematic refseq entry ZP_10312765 and
the unproblematic next one, ZP_10312766. This should allow reproducing
the problem as follows (please use this to check where I do something
stupid or if you can spot anything broken in the ZP_10312765 record,
but don't actually try to reproduce this unless you're confident you can
hack & repair your ~/.embossrc and you understand the commands involved):
(1) Put this into your ~/.embossrc, replacing "yourusername" as appropriate:
SET ptestdir /home/yourusername/tmp/ptest
db ptest [
type: p
directory: $ptestdir
indexdirectory: $ptestdir/index
filename: *.gpff
format: refseqp
method: emboss
release: "test"
comment: "test protein database"
]
(2) Run the following commands:
mkdir ~/tmp/ptest
mkdir ~/tmp/ptest/index
# save or copy the attached ptest.gpff in ~/tmp/ptest
cd ~/tmp/ptest/index
dbxflat -dbname ptest -dbresource dbxresource -idformat refseq -directory ../ -filename '*.gpff' -statistics -fields id,ac,sv,des,key,org -compressed -outfile ptest.dbxflat
cd ../..
seqret -feature -outseq=stdout -osformat=swiss ptest:ZP_10312765
The output I get on my terminal window is:
Reads and writes (returns) sequences
ID ZP_10312765 Reviewed; 498 AA.
AC ZP_10312765;
DT 31-DEC-1899, entry version 1.
DE hypothetical protein FraQA3DRAFT_6339 [Frankia sp. QA3].
OS Frankia sp. QA3.
RN [1]
RP 1-498
RN [2]
RP 1-498
KW .
FT REGION 1 498 Frankia sp. QA3. QA3. taxon:710111.
FT REGION 1 498 hypothetical protein. 53620.
Segmentation fault
I'd appreciate any help / comments.
Best regards, Jan
--
+- Jan T. Kim -------------------------------------------------------+
| email: jttkim at gmail.com |
| WWW: http://www.jtkim.dreamhosters.com/ |
*-----=< hierarchical systems are for files, not for humans >=-----*
More information about the EMBOSS
mailing list