database ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/April_14_2003/ for emboss

Zheng Jin Tu ztu at msi.umn.edu
Tue Apr 22 18:57:47 UTC 2003


Anyone has success story in "indexing" human genome at
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/April_14_2003/
for emboss?

They are fasta format files, I try to run formatdb these chromosomes
then dbiblast.  But it always gives me some errors.


Some runs as

----------------------------------------------
swinst at bi7 [CHR_16up] % head chr16.fa
>gi|29824587|ref|NC_000016.4|NC_000016 Homo sapiens chromosome 16,
complete sequence
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCGACCCTCACCCTCACCCTAACCACATGAGCAATGT
GGGTGTTATATTTTAGCTGTCATGGGTGCATTAGGAATGCTGCATTTGTGTTTCAACGCTGCAACTGGAC
CCTGCAATGCAGCCCCTCGCCTTGCCTTGGGAGAATCTCGGTGCCCAGGATTCAGAGGGGCTTTTAGTTT
CCCATTTTCCACACTGAACCGTTCTAACTGGTCTCTGACCTTGATTATTCACGGCTGCAACCGGGAAAGA
TTTTATTCACTGTCAATGCGCCCCGAGTTGTCCCAAAGCCAGGCAGTGCCCCCAACGTCTGTGCTTAGCA
GAATGCTGCTCCACCTTTACGGTGACCCCCAGGTCTGTGCTGAGCAGAACGCAGCTCCGCCCTCGCAGTA
CCCTCAGCCCGCCCGCCCGGGTCTGACCTGAGCAGAACTCTGCTCTGCCTTCGCAGTACCACCGAAATCT
GTGCAAAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGCGTCTGTGCTGAGGAGAACGCAACTCCGC
CGTCGCAAAGGCGCGCGCCGCGCCGGCGCAGGCGCAGAGGGGCGCGCCGCGCCGGCGCAGGCGCAGAGAC

swinst at bi7 [CHR_16up] % formatdb -i chr16.fa -p F -o T
swinst at bi7 [CHR_16up] % ls -l chr16*
-rw-r--r--    1 swinst   swinst     91281742 Apr 14 05:27 chr16.fa
-rw-r-----    1 swinst   swinst          129 Apr 22 13:53 chr16.fa.nhr
-rw-r-----    1 swinst   swinst           80 Apr 22 13:53 chr16.fa.nin
-rw-r-----    1 swinst   swinst            8 Apr 22 13:53 chr16.fa.nnd
-rw-r-----    1 swinst   swinst           52 Apr 22 13:53 chr16.fa.nni
-rw-r-----    1 swinst   swinst          147 Apr 22 13:53 chr16.fa.nsd
-rw-r-----    1 swinst   swinst           66 Apr 22 13:53 chr16.fa.nsi
-rw-r-----    1 swinst   swinst     22518829 Apr 22 13:53 chr16.fa.nsq

swinst at bi7 [CHR_16up] % dbiblast
Index a BLAST database
Database name: chr16
Database directory [.]:
Wildcard database filename [chr16]: chr16.fa*
Release number [0.0]: 33
Index date [00/00/00]: 04/22/03
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: N
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2
swinst at bi7 [CHR_16up] % ls -rlt
-rw-r-----    1 swinst   swinst            8 Apr 22 13:53 chr16.fa.nnd
-rw-r-----    1 swinst   swinst           52 Apr 22 13:53 chr16.fa.nni
-rw-r-----    1 swinst   swinst          147 Apr 22 13:53 chr16.fa.nsd
-rw-r-----    1 swinst   swinst           66 Apr 22 13:53 chr16.fa.nsi
-rw-r-----    1 swinst   swinst          129 Apr 22 13:53 chr16.fa.nhr
-rw-r-----    1 swinst   swinst           80 Apr 22 13:53 chr16.fa.nin
-rw-r-----    1 swinst   swinst     22518829 Apr 22 13:53 chr16.fa.nsq
-rw-r--r--    1 swinst   swinst          680 Apr 22 13:53 formatdb.log
-rw-r--r--    1 swinst   swinst          344 Apr 22 13:55 division.lkp
-rw-r--r--    1 swinst   swinst          320 Apr 22 13:55 entrynam.idx
-rw-r--r--    1 swinst   swinst          300 Apr 22 13:55 acnum.trg
-rw-r--r--    1 swinst   swinst          300 Apr 22 13:55 acnum.hit
swinst at bi7 [CHR_16up] %

--------------------------------------------------------------------------

Thanks,


Tu

----------------------------------------------------------------
Zheng Jin Tu
Computational Biology Specialist
Supercomputing Institute
599 Walter Library
117 Pleasant Street SE
University of Minnesota
Minneapolis, Minnesota 55455
email: ztu at msi.umn.edu            help email:  help at msi.umn.edu
phone: 612-624-9504, 624-0115     help phone:  612-626-0802
fax:   612-624-8861
-----------------------------------------------------------------





More information about the EMBOSS mailing list