[Bioperl-l] Fishing redundant sequences in FASTA files

Juan Jovel jovel_juan at hotmail.com
Tue Feb 15 18:36:11 UTC 2011


Good Morning guys,
sorry for the naive question: What's the simplest way to fish redundant sequences (complete or partial) between two (or more) fasta files.
I was thinking just to do it with SeqIO, opening two files, and compare each sequence of file_1 to each record of file_2, like:
# Read each record of file 1 and compare to each read of file 2while(my $dna1 = $seqin1->next_seq){        my $seq1 = $dna1->seq;        my $id1 = $dna1->id;
        # Iterate inside de second fasta file        while(my $dna2 = $seqin2->next_seq){                my $seq2 = $dna2->seq;                my $id2 = $dna2->id;
                if(($seq1 =~ /$seq2/)||($seq2 =~ /$seq1/)){                        print "Match found \n";                        print OUT "Records $id1 and $id2 are redundants";
...
I am afraid it is going to be slow for large files.  AND, more importantly, how do I reset the object containing the second file to the first line, as done in Perl with (SEEK(IN, 0,0)) for example.  Does SeqIO allows that (sorry, I am not a frequent user of SeqIO).
If there is another more-elaborated module to fish such redundant sequences, I will appreciate to know.
Thanks,
JUAN 		 	   		  



More information about the Bioperl-l mailing list