[Bioperl-l] Extract Mutation Automatically

Andrew Leung ro_phls2 at dh.gov.hk
Fri Aug 12 06:50:04 EDT 2005


Hi Jason,
I have tired the the seq_inds method in Bio::Search::HSP::HSPI. But, other
than identical and conserved, there is no "mismatched" option.

http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/HSPI.html#POD15

I am still thinking of how to get the mismatch details. Working from
identical/conserved seq_inds values seems to be very complicated.
Andrew 

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich at duke.edu] 
Sent: Thursday, August 11, 2005 9:24 AM
To: andrew_leung at dh.gov.hk
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Extract Mutation Automatically


On Aug 10, 2005, at 8:42 PM, Andrew Leung wrote:

> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-,  
> L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the  
> nomenclature
> for annotating mutations is a standardized one. But, it appears  
> that there
> no such a ready to use module from Bioperl.

Don't despair, you could be the one to do it!  This would probably  
just a be a subroutine and not necessarily a whole module.

That nomenclature assumes a reference sequence and just getting the  
bases you are interested in.  A few substr or subseq calls and you  
would be right there.

-jason


> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung at dh.gov.hk
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
>
>
>> Hi all,
>> Is there any module available that can allow me to extract mutation 
>> (s)
>> automatically? The idea is that if I submit two sequences for
>> alignment, the
>> script can automatically list out all the differences between the two
>> sequences. I wish to know the difference at two levels, i.e. the
>> nucleotide
>> and amino acid level. Any ideas?
>> Andrew
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




More information about the Bioperl-l mailing list