[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]
Cook, Malcolm
MEC at stowers.org
Tue Feb 15 20:28:09 UTC 2011
there there is CD-HIT
and blastclust from ncbi (which I think still gets installed as part of installed NCBI blast suite)
Malcolm Cook
Stowers Institute for Medical Research - Bioinformatics
Kansas City, Missouri USA
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Chris Fields
> Sent: Tuesday, February 15, 2011 2:25 PM
> To: Dave Messina
> Cc: Juan Jovel; bioperl
> Subject: Re: [Bioperl-l] Fishing redundant sequences in FASTA
> files [Right formatting]
>
> SHA should work as well, didn't think of that (though I
> suppose the encoding step for either would be rate-limiting?).
>
> Will have to keep an eye on UCLUST, didn't know about that one.
>
> chris
>
> On Feb 15, 2011, at 2:09 PM, Dave Messina wrote:
>
> > Hi Juan,
> >
> > There's a nice example script in the BioPerl distribution
> that Jason
> > Stajich wrote which uses MD5 checksums to do the sequence
> comparison:
> >
> >
> >
> https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/
> > bp_nrdb.PLS
> >
> >
> > There are also faster, nonBioPerl tools for this, such as
> the one that
> > comes with UCLUST:
> >
> > http://www.drive5.com/usearch/
> >
> >
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list