[EMBOSS] index RefSeq for EMBOSS

Wells, Isabelle isabelle.wells at roche.com
Fri Apr 21 15:43:27 UTC 2006


Hi,

Yes I also index refseq. I think the problem here is that dbiflat can only handle files which are less than 2GB. So try splitting the files first.

Best,
Isabelle

-----Original Message-----
From: emboss-bounces at lists.open-bio.org [mailto:emboss-bounces at lists.open-bio.org] On Behalf Of Olivier Friard
Sent: Friday, April 21, 2006 17:00
To: emboss at emboss.open-bio.org
Subject: [EMBOSS] index RefSeq for EMBOSS


Hi,

I tried to index the RefSeq database:

1) I downloaded all 
ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz 
file (GB format)

2) gunziped

3) Added the rs_dna entry to my .embossrc file


DB rs_dna [
    type: "N"
    method: "emblcd"
    format: "GB"
    dir: "/home/users/friard/data/refseq_genomic/"
    file: "*.gbff"
    release: ""
    comment: "RefSeq Genomic  (upd)"
    indexdir: "/home/users/friard/data/refseq_genomic/"
]


4) used dbiflat with following arguments (from the directory where files 
are stored)

dbiflat
Index a flat file database
Database name: rs_dna
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
     REFSEQ : Refseq
Entry format [SWISS]: REFSEQ
Database directory [.]:
Wildcard database filename [*.dat]: *.gbff
Release number [0.0]:
Index date [00/00/00]:

The indexes were created but when I try to access to a sequence (i.e 
seqret rs_rna:NC_000004) then results is not the correct sequence but an 
other one with the NC_000004 ID!



I also downloaded the file in FASTA format and tried to index them with 
the dbifasta command (format: ncbi) without positive results:

seqret rs_dna:nc_000004
Reads and writes (returns) sequences
Error: Unable to read sequence 'rs_dna:nc_000004'
Died: seqret terminated: Bad value for '-sequence' and no prompt


Does anyone index the RefSeq successfully?
Thank you in advance






-- 

Olivier Friard
Laboratorio di Biologia Computazionale
Facoltà di Scienze MFN
Università di Torino
via Accademia Albertina 13, 10124 TORINO (Italy)

tel. +39 011 6704689

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss




More information about the EMBOSS mailing list