[Bioperl-l] Extract Mutation Automatically

Fri Aug 12 07:58:31 EDT 2005

'nomatch'

On Aug 12, 2005, at 6:50 AM, Andrew Leung wrote:

> Hi Jason,
> I have tired the the seq_inds method in Bio::Search::HSP::HSPI.  
> But, other
> than identical and conserved, there is no "mismatched" option.
>
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/ 
> HSPI.html#POD15
>
> I am still thinking of how to get the mismatch details. Working from
> identical/conserved seq_inds values seems to be very complicated.
> Andrew
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, August 11, 2005 9:24 AM
> To: andrew_leung at dh.gov.hk
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
>
> On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote:
>
>
>> Hi Jason,
>> Thank you for advice. I will try the various approaches suggested. My
>> ultimate goal is to extract something like this: A267G, Z786-,
>> L898Y etc.
>> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the
>> nomenclature
>> for annotating mutations is a standardized one. But, it appears
>> that there
>> no such a ready to use module from Bioperl.
>>
>
> Don't despair, you could be the one to do it!  This would probably
> just a be a subroutine and not necessarily a whole module.
>
> That nomenclature assumes a reference sequence and just getting the
> bases you are interested in.  A few substr or subseq calls and you
> would be right there.
>
> -jason
>
>
>
>> Andrew
>>
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Wednesday, August 10, 2005 10:36 AM
>> To: andrew_leung at dh.gov.hk
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>>
>> I guess it comes down to what you want to do with the mutations once
>> you've found them.
>>
>> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
>> can call on hsp objects you've gotten out of pairwise alignment
>> searches. seq_inds will give you the location of the identical,
>> conserved, mismatched columns from a pairwise alignment.  I would
>> suggest using FASTA or SSEARCH and
>>
>> If you had two files with seqs to align called 'seq1.fa' and  
>> 'seq2.fa'
>>
>> Here is how I would get the pairwise SW alignment and get the
>> mutations out.
>>
>> If you wanted a global alignment you can use the EMBOSS tool 'needle'
>> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>>
>> some simple code to print out the bases which have mismatches
>> use Bio::SearchIO;
>> use strict;
>> my $fh;
>> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
>> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
>> #my $parser = Bio::SearchIO->new(-format => 'fasta',
>> #                -fh     => $fh);
>> my $parser = Bio::SearchIO->new(-format => 'blast',
>>                                                                 -
>> fh        => $fh);
>>
>> if( my $result = $parser->next_result ) { # single result so use if
>> instead of while
>>      if( my $hit = $result->next_hit ) {    # ditto, want single
>> result...
>>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
>> need to consider more if using BLAST
>>
>>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>>          # if this is protein and you want to treat the conservative
>> matches as mismatches
>>          # you'll need to run the same method but asking for
>> 'conserved' and then combing the two lists
>>
>>          for my $base ( @qmismatches ) {
>>             print "base $base of the hit sequence is a mismatch \n",
>>         }
>>      }
>>      }
>> }
>>
>>
>> The Bio::PopGen::Utilities module can also take an alignment and
>> extract the positions with variation for use in polymorphism  
>> analyses.
>>
>> -jason
>>
>> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
>>
>>
>>
>>> Hi all,
>>> Is there any module available that can allow me to extract mutation
>>> (s)
>>> automatically? The idea is that if I submit two sequences for
>>> alignment, the
>>> script can automatically list out all the differences between the  
>>> two
>>> sequences. I wish to know the difference at two levels, i.e. the
>>> nucleotide
>>> and amino acid level. Any ideas?
>>> Andrew
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/