[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]
Dave Messina
David.Messina at sbc.su.se
Tue Feb 15 20:09:35 UTC 2011
Hi Juan,
There's a nice example script in the BioPerl distribution that Jason Stajich
wrote which uses MD5 checksums to do the sequence comparison:
https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/bp_nrdb.PLS
There are also faster, nonBioPerl tools for this, such as the one that comes
with UCLUST:
http://www.drive5.com/usearch/
Dave
More information about the Bioperl-l
mailing list