[EMBOSS] how to find unique DNA sequences from a large database

Thu Dec 7 20:36:03 UTC 2006

Hi,

Are there any tools for find unique sequences from a large database? Many
thanks.

I need to find unique DNA sequences from a large database. A short piece is
given as follows.

>001
aaaagttgtgtgtgtatgacaggtt
>013
aacctgtcatacacacacaactttt
>289
gttgtgtgtgtatgacaggtt
>375
tgtgtgtatgacaggttgat
>319
tcaacctgtcatacacaca
>177
cgcagtgtgtgtatgacagg
>271
gtcctacctgtcatacacac
>020
aagacataatgtgtgtatgacag

All these seem to be the same sequence, since BLASTN gives very small
e-values for their alignments.

BLASTN 2.2.8 [Jan-05-2004]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= 001
         (25 letters)

Database: drought-clustered.fa
           410 sequences; 8877 total letters

Searching.done

                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

013                                                                    50
8e-11
001                                                                    50
8e-11
289                                                                    42
2e-08
375                                                                    34
5e-06
319                                                                    34
5e-06
177                                                                    32
2e-05
271                                                                    30
8e-05
020                                                                    28
3e-04

Best regards.

sincerely

Zheng, Yun

Department of Computer Science

Washington University in St Louis

Campus Box 1045

1 Brookings Drive, St Louis, MO 63130