[EMBOSS] how to find unique DNA sequences from a large database
Michael Thon
mthon at tamu.edu
Thu Dec 7 23:55:00 UTC 2006
Hi Yun , you might try a clustering algorithm like blastclust (single
linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others
that exist. I can't think of any EMBOSS apps that would solve this
problem, but maybe someone else has a better answer.
Mike
On Dec 7, 2006, at 2:36 PM, yun zheng wrote:
> Hi,
>
> Are there any tools for find unique sequences from a large
> database? Many
> thanks.
>
> I need to find unique DNA sequences from a large database. A short
> piece is
> given as follows.
>
>> 001
> aaaagttgtgtgtgtatgacaggtt
>> 013
> aacctgtcatacacacacaactttt
>> 289
> gttgtgtgtgtatgacaggtt
>> 375
> tgtgtgtatgacaggttgat
>> 319
> tcaacctgtcatacacaca
>> 177
> cgcagtgtgtgtatgacagg
>> 271
> gtcctacctgtcatacacac
>> 020
> aagacataatgtgtgtatgacag
>
> All these seem to be the same sequence, since BLASTN gives very small
> e-values for their alignments.
>
> BLASTN 2.2.8 [Jan-05-2004]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database
> search
> programs", Nucleic Acids Res. 25:3389-3402.
>
> Query= 001
> (25 letters)
>
> Database: drought-clustered.fa
> 410 sequences; 8877 total letters
>
> Searching.done
>
>
> Score E
> Sequences producing significant alignments:
> (bits)
> Value
>
> 013
> 50
> 8e-11
> 001
> 50
> 8e-11
> 289
> 42
> 2e-08
> 375
> 34
> 5e-06
> 319
> 34
> 5e-06
> 177
> 32
> 2e-05
> 271
> 30
> 8e-05
> 020
> 28
> 3e-04
>
> Best regards.
>
> sincerely
>
> Zheng, Yun
>
> Department of Computer Science
>
> Washington University in St Louis
>
> Campus Box 1045
>
> 1 Brookings Drive, St Louis, MO 63130
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list