NCBI nt database index
Martin Sarachu
mad at biol.unlp.edu.ar
Wed Jun 26 15:16:57 UTC 2002
Hi,
we have the non-redundant NCBI nucleotide database (nt) indexed with
> $ dbifasta -idformat ncbi
the raw nt database look like this
> >gi|4003368|dbj|AB000282.1|AB000282 Navel orange infectious mottling virus gene for polyprotein (coat protein region), partial cds
> AATGTCACCATTGAAAGTGGTGACAATAATAATAATAATTGTCCCACCGGTAATGTAGATAATAGAGAAATACCGGTGGT
> .......
> >gi|1827449|dbj|AB000449.1|AB000449 Homo sapiens mRNA for VRK1, complete cds
> CCGAGTTACGAGTCGGCGAAAGCGGCGGGAAGTTCGTACTGGGCAGAACGCGACGGGTCTGCGGCTTAGGTGAAAATGCC
> etc
and when we run
> $ fuzznuc -raccshow2 -rdesshow2 -rusashow2
> Nucleic acid pattern search
> Input sequence(s): nt:*
> Search pattern: GGTTTCsanttyggnac
> Number of mismatches [0]: 3
> Output report [gi.fuzznuc]: xx.fuzznuc
we get this
> $ more xx.fuzznuc
> ########################################
> # Program: fuzznuc
> # Rundate: Wed Jun 26 18:09:24 2002
> # Report_file: xx.fuzznuc
> ########################################
>
> #=======================================
> #
> # Sequence: nt-id:gi from: 1 to: 1904
> # Accession:
> # Description: Schizosaccharomyces pombe DNA for SUI1 homologue, complete cds
> # HitCount: 1
> #
> # Pattern: GGTTTCsanttyggnac
> # Mismatch: 3
> # Complement: No
> #
> #=======================================
>
> Start End Mismatch Sequence
> 9 25 3 GGTTACCATTTTGGCTA
>
> ....
> # Sequence: nt-id:gi from: 1 to: 17070
> # Accession:
> # Description: Oryza sativa gene for NADH-dependent glutamate synthase
> # HitCount: 1
> #
> etc
The acnum.hit, acnum.trg, division.lkp and entrynam.idx for nt database
seems to be correct. Any idea why de Accesion numbers doesn't show up on
the fuzznuc results?
martin
--
Martin Sarachu
mad at biol.unlp.edu.ar
EMBnet Argentina
http://www.ar.embnet.org
More information about the EMBOSS
mailing list