[Bioperl-l] removing duplicate fasta records
Simon K. Chan
bioinformatics_rocks@yahoo.com
Thu, 19 Dec 2002 17:14:37 -0800 (PST)
Hiya Amit,
Just to add what Jonathan Epstein wrote, try something
like this:
#!/usr/bin/perl -w
user strict;
use Bio::SeqIO;
my $in = Bio::SeqIO->newfh(-file=>$fileName,
-format=>"FASTA");
my %matching_hash = ();
my %final_hash = ();
while (my $obj = <$in>){
unless($matching_hash{$obj->seq}){
$final_hash{$obj->display_id} = $obj->seq;
$matcthing_hash{$obj->seq} = 1;
}
}
TMTOWTDI!
:-)
HTH,
Simon
--- Jonathan Epstein <Jonathan_Epstein@nih.gov> wrote:
> To: "Amit Indap <indapa@cs.arizona.edu>"
> <indapa@amadeus.biosci.arizona.edu>,
> bioperl-l@bioperl.org
> From: Jonathan Epstein <Jonathan_Epstein@nih.gov>
> Subject: Re: [Bioperl-l] removing duplicate fasta
> records
> Date: Tue, 17 Dec 2002 15:53:57 -0500
>
> Option #1:
> create a hash with all the sequences as you read
> them, and check for duplicates by seeing whether
> that hash element already exists
>
> Option #2 (slightly harder):
> create such a hash and check for duplicates, but
> instead of hashing the sequences, hash a checksum of
> the sequences such as MD5:
>
>
http://search.cpan.org/author/GAAS/Digest-MD5-2.20/MD5.pm
> or the GCG checksum:
>
>
http://search.cpan.org/author/BIRNEY/bioperl-1.0.2/Bio/SeqIO/gcg.pm
>
> This requires more CPU time, but much less memory.
>
>
> Option #1 is quick-and-dirty, and is appropriate if
> your input file contains only a few megabytes of
> data (or less).
>
> Jonathan
>
>
> At 12:41 PM 12/17/2002 -0700, Amit Indap
> <indapa@cs.arizona.edu> wrote:
> >I have a file with a list of fasta sequences. Is
> there a way to
> >remove records with the identical sequence? I am a
> newbie to BioPerl,
> >and my search through the documentation hasn't
> found anything.
> >
> >Thank you.
> >
> >Amit Indap
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com