[EMBOSS] index RefSeq for EMBOSS

simon andrews (BI) simon.andrews at bbsrc.ac.uk
Fri Apr 21 15:35:29 UTC 2006


On 21 Apr 2006, at 16:00, Olivier Friard wrote:

> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but 
> an
> other one with the NC_000004 ID!

Is it just finding the wrong sequence or could you have duplicate 
entries in the data?  Use entret to see if the entry really has that 
ID.

We found that we got problems with incorrect or no sequences being 
returned by seqret when some of the individual sequence files were >2Gb 
in size.  In these cases you can use the new dbx* indexing programs 
which handle large files properly.

> Does anyone index the RefSeq successfully?

Yes.  We use it here without problems, but indexed with dbxflat.

It gets indexed with:

dbxflat -dbresource all -auto -idformat refseq -dbname refseq_all 
-filenames \*.gbff

..and the emboss.default entry looks like:

DB refseq_all
  [
     type: N
     comment: "Refseq"
     method: emboss
     format: genbank
     dbalias: refseq_all
     directory: /data/public/DNA/Refseq/Current/all
     file: *.gbff
  ]

with the resource section being:

RES all [ type: Index
   idlen:  15
   acclen: 15
   svlen:  15
   keylen: 15
   deslen: 15
   orglen: 15
]


Simon.
-- 
Simon Andrews PhD
Bioinformatics Dept.
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0) 1223 496463




More information about the EMBOSS mailing list