[EMBOSS] index RefSeq for EMBOSS
David.Bauer at schering.de
David.Bauer at schering.de
Mon Apr 24 05:52:50 UTC 2006
You can also try the new indexing programs dbxflat and dbxfasta, which can
handle files larger than 2 GB.
Regards,
David.
emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27:
> Hi,
>
> Yes I also index refseq. I think the problem here is that dbiflat
> can only handle files which are less than 2GB. So try splitting the
> files first.
>
> Best,
> Isabelle
>
> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of Olivier Friard
> Sent: Friday, April 21, 2006 17:00
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] index RefSeq for EMBOSS
>
>
> Hi,
>
> I tried to index the RefSeq database:
>
> 1) I downloaded all
> ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz
> file (GB format)
>
> 2) gunziped
>
> 3) Added the rs_dna entry to my .embossrc file
>
>
> DB rs_dna [
> type: "N"
> method: "emblcd"
> format: "GB"
> dir: "/home/users/friard/data/refseq_genomic/"
> file: "*.gbff"
> release: ""
> comment: "RefSeq Genomic (upd)"
> indexdir: "/home/users/friard/data/refseq_genomic/"
> ]
>
>
> 4) used dbiflat with following arguments (from the directory where files
> are stored)
>
> dbiflat
> Index a flat file database
> Database name: rs_dna
> EMBL : EMBL
> SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
> GB : Genbank, DDBJ
> REFSEQ : Refseq
> Entry format [SWISS]: REFSEQ
> Database directory [.]:
> Wildcard database filename [*.dat]: *.gbff
> Release number [0.0]:
> Index date [00/00/00]:
>
> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but an
> other one with the NC_000004 ID!
>
>
>
> I also downloaded the file in FASTA format and tried to index them with
> the dbifasta command (format: ncbi) without positive results:
>
> seqret rs_dna:nc_000004
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'rs_dna:nc_000004'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
>
>
> Does anyone index the RefSeq successfully?
> Thank you in advance
>
>
>
>
>
>
> --
>
> Olivier Friard
> Laboratorio di Biologia Computazionale
> Facoltà di Scienze MFN
> Università di Torino
> via Accademia Albertina 13, 10124 TORINO (Italy)
>
> tel. +39 011 6704689
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list