[EMBOSS] index RefSeq for EMBOSS
simon andrews (BI)
simon.andrews at bbsrc.ac.uk
Fri Apr 21 15:35:29 UTC 2006
On 21 Apr 2006, at 16:00, Olivier Friard wrote:
> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but
> an
> other one with the NC_000004 ID!
Is it just finding the wrong sequence or could you have duplicate
entries in the data? Use entret to see if the entry really has that
ID.
We found that we got problems with incorrect or no sequences being
returned by seqret when some of the individual sequence files were >2Gb
in size. In these cases you can use the new dbx* indexing programs
which handle large files properly.
> Does anyone index the RefSeq successfully?
Yes. We use it here without problems, but indexed with dbxflat.
It gets indexed with:
dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all
-filenames \*.gbff
..and the emboss.default entry looks like:
DB refseq_all
[
type: N
comment: "Refseq"
method: emboss
format: genbank
dbalias: refseq_all
directory: /data/public/DNA/Refseq/Current/all
file: *.gbff
]
with the resource section being:
RES all [ type: Index
idlen: 15
acclen: 15
svlen: 15
keylen: 15
deslen: 15
orglen: 15
]
Simon.
--
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute
simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463
More information about the EMBOSS
mailing list