[Bioperl-l] Allowing One error in Sequence matching

Abhishek Pratap abhishek.vit at gmail.com
Thu Sep 17 01:39:13 UTC 2009


Hi Russell

Thanks for a quick reply. However I am not following the code clearly
and the reason behind it.

Will this work for  matching AGCT  to ACCT | ANCT | AACT. It dint give
me the expected output when I ran it. I am more interested in
understanding the logic.

It would be great if you could expand a bit more.


Also if I do it the brute force way as suggested to me by a frnd , how
will that work in terms of scalability.

@dna1=split(//,$a);
@dna2=split(//,$b);
$x=0;
for($i=0;$i<@dna1;$i++){
        if ($dna1[$i] ne $dna2[$i]){
                        $x++;
        }
}

if($x<=1){
        print "RESULT: your sequence is true\n";
}

else { print " RESULT: your sequence is false\n";}

Thanks,
-Abhi


On Wed, Sep 16, 2009 at 7:06 PM, Smithies, Russell
<Russell.Smithies at agresearch.co.nz> wrote:
> How about chunk it into overlapping words, skip if >2 N, then regex?
>
> $seq = "CGATCGNATGNCGTCTAGCTGACANGTTGACTCTAGCTGATCGATCGATCGTACGTANNCGTAGTCGTACNTACGATCTNACGCACGNATGCTACGTACG";
>
> $motif = "ACGT";
> foreach (split //, $motif) {$w .= "[${_}N]"}
>
> foreach ($seq =~ /(?=(\w{4}))/g){
>  next if tr/N/N/ >= 2;
>  print "$_\n" if  eval "/$w/" ;
> }
>
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Abhishek Pratap
>> Sent: Thursday, 17 September 2009 9:42 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Allowing One error in Sequence matching
>>
>> Hi All
>>
>> I am not able to think of smart way to do sequence matching allowing
>> userdefined number of mismatches.
>>
>> For eg:
>>
>> Given Sequence : AGCT will be considered a match to reference if any
>> one base pair position #(1,2,3,4)  has a mismatch that is  [ACGTN] so
>> the possible matches could be
>>
>> This is for position 1.
>> AGCT
>> GGCT
>> CGCT
>> TGCT
>> NGCT
>> and likewise for each position.
>>
>> any nice regular expression. One way that I could think was to
>> generate all the possible tags for a given sequence and then do the
>> matching. It will be a computationally expensive for long dataset .
>> Any neat method ?
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>




More information about the Bioperl-l mailing list