[Bioperl-l] removing duplicate fasta records
nkuipers
nkuipers@uvic.ca
Tue, 17 Dec 2002 12:10:58 -0800
This is a simple way you could do it. The file some_name would contain
non-redundant fasta entries (at least as far as the sequences go). I think
there is also an Index module in bioperl, so you might consider the code below
a very lightweight way of doing it. :)
Cheers,
nathanael kuipers
---
use Bio::SeqIO;
my $file = shift;
my %already_seen;
my $stream = Bio::SeqIO->new( -file => $file );
my $writer = Bio::SeqIO->new( -file => ">>some_name", -format => 'Fasta' );
while ( my $seqobj = $stream->next_seq() ) {
if ( exists $already_seen{$seqobj->seq} ) { next }
else { $already_seen{$seqobj->seq}++; $writer->write_seq( $seqobj ); }
}
>===== Original Message From "Amit Indap <indapa@cs.arizona.edu>"
<indapa@amadeus.biosci.arizona.edu> =====
>I have a file with a list of fasta sequences. Is there a way to
>remove records with the identical sequence? I am a newbie to BioPerl,
>and my search through the documentation hasn't found anything.
>
>Thank you.
>
>Amit Indap
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l