[EMBOSS] how to find unique DNA sequences from a large database
yun zheng
mincloud at gmail.com
Thu Dec 7 20:36:03 UTC 2006
Hi,
Are there any tools for find unique sequences from a large database? Many
thanks.
I need to find unique DNA sequences from a large database. A short piece is
given as follows.
>001
aaaagttgtgtgtgtatgacaggtt
>013
aacctgtcatacacacacaactttt
>289
gttgtgtgtgtatgacaggtt
>375
tgtgtgtatgacaggttgat
>319
tcaacctgtcatacacaca
>177
cgcagtgtgtgtatgacagg
>271
gtcctacctgtcatacacac
>020
aagacataatgtgtgtatgacag
All these seem to be the same sequence, since BLASTN gives very small
e-values for their alignments.
BLASTN 2.2.8 [Jan-05-2004]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 001
(25 letters)
Database: drought-clustered.fa
410 sequences; 8877 total letters
Searching.done
Score E
Sequences producing significant alignments: (bits)
Value
013 50
8e-11
001 50
8e-11
289 42
2e-08
375 34
5e-06
319 34
5e-06
177 32
2e-05
271 30
8e-05
020 28
3e-04
Best regards.
sincerely
Zheng, Yun
Department of Computer Science
Washington University in St Louis
Campus Box 1045
1 Brookings Drive, St Louis, MO 63130
More information about the EMBOSS
mailing list