[Bioperl-l] Extract Mutation Automatically

Wed Aug 10 20:42:11 EDT 2005

Hi Jason,
Thank you for advice. I will try the various approaches suggested. My
ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
for annotating mutations is a standardized one. But, it appears that there
no such a ready to use module from Bioperl.
Andrew

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at duke.edu] 
Sent: Wednesday, August 10, 2005 10:36 AM
To: andrew_leung at dh.gov.hk
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Extract Mutation Automatically

I guess it comes down to what you want to do with the mutations once  
you've found them.

The seq_inds method in Bio::Search::HSP::HSPI  which is something you  
can call on hsp objects you've gotten out of pairwise alignment  
searches. seq_inds will give you the location of the identical,  
conserved, mismatched columns from a pairwise alignment.  I would  
suggest using FASTA or SSEARCH and

If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'

Here is how I would get the pairwise SW alignment and get the  
mutations out.

If you wanted a global alignment you can use the EMBOSS tool 'needle'  
and generate an MSF alignment which can be parsed with Bio::AlignIO.

some simple code to print out the bases which have mismatches
use Bio::SearchIO;
use strict;
my $fh;
#open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
#my $parser = Bio::SearchIO->new(-format => 'fasta',
#                -fh     => $fh);
my $parser = Bio::SearchIO->new(-format => 'blast',
                                                                - 
fh        => $fh);

if( my $result = $parser->next_result ) { # single result so use if  
instead of while
     if( my $hit = $result->next_hit ) {    # ditto, want single  
result...
     if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would  
need to consider more if using BLAST

         my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
         # if this is protein and you want to treat the conservative  
matches as mismatches
         # you'll need to run the same method but asking for  
'conserved' and then combing the two lists

         for my $base ( @qmismatches ) {
            print "base $base of the hit sequence is a mismatch \n",
        }
     }
     }
}

The Bio::PopGen::Utilities module can also take an alignment and  
extract the positions with variation for use in polymorphism analyses.

-jason

On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:

> Hi all,
> Is there any module available that can allow me to extract mutation(s)
> automatically? The idea is that if I submit two sequences for  
> alignment, the
> script can automatically list out all the differences between the two
> sequences. I wish to know the difference at two levels, i.e. the  
> nucleotide
> and amino acid level. Any ideas?
> Andrew
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12