[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]
Chris Fields
cjfields at illinois.edu
Tue Feb 15 20:25:07 UTC 2011
SHA should work as well, didn't think of that (though I suppose the encoding step for either would be rate-limiting?).
Will have to keep an eye on UCLUST, didn't know about that one.
chris
On Feb 15, 2011, at 2:09 PM, Dave Messina wrote:
> Hi Juan,
>
> There's a nice example script in the BioPerl distribution that Jason Stajich
> wrote which uses MD5 checksums to do the sequence comparison:
>
>
> https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/bp_nrdb.PLS
>
>
> There are also faster, nonBioPerl tools for this, such as the one that comes
> with UCLUST:
>
> http://www.drive5.com/usearch/
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list