[EMBOSS] seqret segfault (refseq protein sequence, indexed with dbxflat)

Jan T Kim jttkim at googlemail.com
Thu Feb 28 16:14:00 UTC 2013


Dear All,

I've run into a weird problem with seqret after downloading the complete
protein refseq database and indexing that with dbxflat. The problem
seems to be triggered by a rare condition, so far I've only encountered
it with accession ZP_10312765:

    % seqret -feature -outseq=stdout -osformat=swiss ptest:ZP_10312765

Monitoring the seqret process using top, I noticed that the process
grows to a size of 2g before segfaulting.

Trying the same with ZP_10312766, the next record in the file, causes
no problem. Also, -osformat=fasta and -osformat=genbank work with
ZP_10312765, so the problem seems to be with outputting the swiss format.

This happens on an Ubuntu 12.10 system (Linux 3.5.0-22-generic x86_64,
emboss 6.4.0-4) and a Debian wheezy system (Linux 3.2.35-2 x86_64,
emboss 6.4.0-2) as well.

I attach a file containing the problematic refseq entry ZP_10312765 and
the unproblematic next one, ZP_10312766. This should allow reproducing
the problem as follows (please use this to check where I do something
stupid or if you can spot anything broken in the ZP_10312765 record,
but don't actually try to reproduce this unless you're confident you can
hack & repair your ~/.embossrc and you understand the commands involved):


(1) Put this into your ~/.embossrc, replacing "yourusername" as appropriate:

SET ptestdir /home/yourusername/tmp/ptest
db ptest [
        type: p
        directory: $ptestdir
        indexdirectory: $ptestdir/index
        filename: *.gpff
        format: refseqp
        method: emboss
        release: "test"
        comment: "test protein database"
]


(2) Run the following commands:

mkdir ~/tmp/ptest
mkdir ~/tmp/ptest/index
# save or copy the attached ptest.gpff in ~/tmp/ptest
cd ~/tmp/ptest/index
dbxflat -dbname ptest -dbresource dbxresource -idformat refseq -directory ../ -filename '*.gpff' -statistics -fields id,ac,sv,des,key,org -compressed -outfile ptest.dbxflat
cd ../..
seqret -feature -outseq=stdout -osformat=swiss ptest:ZP_10312765

The output I get on my terminal window is:

Reads and writes (returns) sequences
ID   ZP_10312765             Reviewed;         498 AA.
AC   ZP_10312765;
DT   31-DEC-1899, entry version 1.
DE   hypothetical protein FraQA3DRAFT_6339 [Frankia sp. QA3].
OS   Frankia sp. QA3.
RN   [1]
RP   1-498
RN   [2]
RP   1-498
KW   .
FT   REGION        1    498       Frankia sp. QA3. QA3. taxon:710111.
FT   REGION        1    498       hypothetical protein. 53620.
Segmentation fault

I'd appreciate any help / comments.

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*



More information about the EMBOSS mailing list