[EMBOSS] how to find unique DNA sequences from a large database

Thu Dec 7 23:55:00 UTC 2006

Hi Yun , you might try a clustering algorithm like blastclust (single  
linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others  
that exist.  I can't think of any EMBOSS apps that would solve this  
problem, but maybe someone else has a better answer.
Mike

On Dec 7, 2006, at 2:36 PM, yun zheng wrote:

> Hi,
>
> Are there any tools for find unique sequences from a large  
> database? Many
> thanks.
>
> I need to find unique DNA sequences from a large database. A short  
> piece is
> given as follows.
>
>> 001
> aaaagttgtgtgtgtatgacaggtt
>> 013
> aacctgtcatacacacacaactttt
>> 289
> gttgtgtgtgtatgacaggtt
>> 375
> tgtgtgtatgacaggttgat
>> 319
> tcaacctgtcatacacaca
>> 177
> cgcagtgtgtgtatgacagg
>> 271
> gtcctacctgtcatacacac
>> 020
> aagacataatgtgtgtatgacag
>
> All these seem to be the same sequence, since BLASTN gives very small
> e-values for their alignments.
>
> BLASTN 2.2.8 [Jan-05-2004]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database  
> search
> programs",  Nucleic Acids Res. 25:3389-3402.
>
> Query= 001
>          (25 letters)
>
> Database: drought-clustered.fa
>            410 sequences; 8877 total letters
>
> Searching.done
>
>                                                                   
> Score    E
> Sequences producing significant alignments:                       
> (bits)
> Value
>
> 013                                                                    
>  50
> 8e-11
> 001                                                                    
>  50
> 8e-11
> 289                                                                    
>  42
> 2e-08
> 375                                                                    
>  34
> 5e-06
> 319                                                                    
>  34
> 5e-06
> 177                                                                    
>  32
> 2e-05
> 271                                                                    
>  30
> 8e-05
> 020                                                                    
>  28
> 3e-04
>
> Best regards.
>
> sincerely
>
> Zheng, Yun
>
> Department of Computer Science
>
> Washington University in St Louis
>
> Campus Box 1045
>
> 1 Brookings Drive, St Louis, MO 63130
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss