Dear all,

I am experiencing a very strange problem using seqret and entret. I have
downloaded and indexed with dbiflat the database files from the Homo
sapiens subsection of Ensembl (latest version as of Nov 15th).
When I try to get one sequence with seqret or the complete entry with
entret using the AC code I got the following error message:

claudia at pc-31-18-86-200:~> seqret
Reads and writes (returns) sequences
Error: Unable to read sequence
Died: seqret terminated: Bad value for '-sequence' and no prompt

I think that the format used by Ensembl for assigning IDs and ACCs is
causing the problems. For example the first entry from the flat file:

claudia at pc-31-18-86-200:~> head
ID   1    standard; DNA; HTG; 970768 BP.
AC   chromosome:NCBI36:1:1000001:1970768:1
SV   chromosome:NCBI36:1:1000001:1970768:1
DT   5-OCT-2006
DE   Homo sapiens chromosome 1 NCBI36 partial sequence 1000001..1970768
DE   annotated by Ensembl

I tried replacing the ":" character of the AC line with a "_" using sed
but after indexing and I get the same error message with seqret or
entret. Is there any length limit for IDs or ACCs in EMBOSS? Is there
any workaround for this problem?


EMBOSS 4.0.0
SUSE 9.3

# Name         Type  ID  Qry All Comment
# ============ ==== ==  === === =======
embldnahs      N    OK  OK  OK  Ensembl EMBL DNA H.sapiens


DB embldnahs [
type: N
dir: /local/bioinfo/db/ensembl/embl
method: emblcd
format: embl
file: *.dat
comment: "Ensembl EMBL DNA H.sapiens"]

