NCBI nt database index

Martin Sarachu mad at biol.unlp.edu.ar
Wed Jun 26 15:16:57 UTC 2002


Hi,

we have the non-redundant NCBI nucleotide database (nt) indexed with

> $ dbifasta -idformat ncbi

the raw nt database look like this

> >gi|4003368|dbj|AB000282.1|AB000282 Navel orange infectious mottling virus gene for polyprotein (coat protein region), partial cds
> AATGTCACCATTGAAAGTGGTGACAATAATAATAATAATTGTCCCACCGGTAATGTAGATAATAGAGAAATACCGGTGGT
> .......
> >gi|1827449|dbj|AB000449.1|AB000449 Homo sapiens mRNA for VRK1, complete cds
> CCGAGTTACGAGTCGGCGAAAGCGGCGGGAAGTTCGTACTGGGCAGAACGCGACGGGTCTGCGGCTTAGGTGAAAATGCC
> etc

and when we run 

> $ fuzznuc -raccshow2 -rdesshow2 -rusashow2 
> Nucleic acid pattern search
> Input sequence(s): nt:*
> Search pattern: GGTTTCsanttyggnac
> Number of mismatches [0]: 3
> Output report [gi.fuzznuc]: xx.fuzznuc

we get this

> $ more xx.fuzznuc
> ########################################
> # Program: fuzznuc
> # Rundate: Wed Jun 26 18:09:24 2002
> # Report_file: xx.fuzznuc
> ########################################
> 
> #=======================================
> #
> # Sequence: nt-id:gi     from: 1   to: 1904
> # Accession: 
> # Description: Schizosaccharomyces pombe DNA for SUI1 homologue, complete cds
> # HitCount: 1
> #
> # Pattern: GGTTTCsanttyggnac
> # Mismatch: 3
> # Complement: No
> #
> #=======================================
> 
>   Start     End Mismatch Sequence
>       9      25        3 GGTTACCATTTTGGCTA
> 
> ....
> # Sequence: nt-id:gi     from: 1   to: 17070
> # Accession: 
> # Description: Oryza sativa gene for NADH-dependent glutamate synthase
> # HitCount: 1
> #
> etc


The acnum.hit, acnum.trg, division.lkp and entrynam.idx for nt database
seems to be correct. Any idea why de Accesion numbers doesn't show up on
the fuzznuc results?

martin

-- 
Martin Sarachu
mad at biol.unlp.edu.ar
EMBnet Argentina
http://www.ar.embnet.org



More information about the EMBOSS mailing list