[Bioperl-l] Extract Mutation Automatically

Mon Aug 15 20:49:52 EDT 2005

Hi Heikki,
Thank you for your note.
I now have two strands of sequences obtained from a hsp and an array of
mutation position information resulted from seq_inds() with 'mismatch'
option. Do you mean that I can put these data to Bio::Variation and generate
a mutation list as desired? I am quite new to Bioperl. Can you explain in
greater details? I've read the documentation for Bio::Variation, but it
appears to me that its methods are mainly for "set", but not for "reading"
mutation.
Andrew

= = = = = = = = = = 
Andrew,

Once you have extracted the information, you can create Bio::Variation
objects 
which know how to stringify the description according to human mutation 
nomenclature rules.

In practise, you create a Bio::Variation::SeqDiff object, add to it the 
appropriate Bio::Variation::{DNAMutation|RNAChange|AAChange} objects and
call 
methods sysname() for nucleotides descriptor or trivname() for amino acid 
descriptor.

The nomenclature used is not the most recent complex suggestion from den 
Dunnen et al but original (and in basic cases identical) from Antonorakis et

al.

 -Heikki

On Thursday 11 August 2005 01:42, Andrew Leung wrote:
> Hi Jason,
> Thank you for advice. I will try the various approaches suggested. My
> ultimate goal is to extract something like this: A267G, Z786-, L898Y etc.
> for aa and A162T, G339A, A388N, etc. for nt. Preferably, the nomenclature
> for annotating mutations is a standardized one. But, it appears that there
> no such a ready to use module from Bioperl.
> Andrew
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, August 10, 2005 10:36 AM
> To: andrew_leung at dh.gov.hk
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Extract Mutation Automatically
>
> I guess it comes down to what you want to do with the mutations once
> you've found them.
>
> The seq_inds method in Bio::Search::HSP::HSPI  which is something you
> can call on hsp objects you've gotten out of pairwise alignment
> searches. seq_inds will give you the location of the identical,
> conserved, mismatched columns from a pairwise alignment.  I would
> suggest using FASTA or SSEARCH and
>
> If you had two files with seqs to align called 'seq1.fa' and 'seq2.fa'
>
> Here is how I would get the pairwise SW alignment and get the
> mutations out.
>
> If you wanted a global alignment you can use the EMBOSS tool 'needle'
> and generate an MSF alignment which can be parsed with Bio::AlignIO.
>
> some simple code to print out the bases which have mismatches
> use Bio::SearchIO;
> use strict;
> my $fh;
> #open($fh, "bl2seq -i seq1.fa -j seq2.fa -p blastn |") || die $!;
> open($fh, "fasta34 seq1.fa seq2.fa  |") || die $!;
> #my $parser = Bio::SearchIO->new(-format => 'fasta',
> #                -fh     => $fh);
> my $parser = Bio::SearchIO->new(-format => 'blast',
>                                                                 -
> fh        => $fh);
>
> if( my $result = $parser->next_result ) { # single result so use if
> instead of while
>      if( my $hit = $result->next_hit ) {    # ditto, want single
> result...
>      if( my $hsp = $hit->next_hsp ) { # single HSP from FASTA, would
> need to consider more if using BLAST
>
>          my (@qmismatches) = $hsp->seq_inds('hit', 'nomatch');
>          # if this is protein and you want to treat the conservative
> matches as mismatches
>          # you'll need to run the same method but asking for
> 'conserved' and then combing the two lists
>
>          for my $base ( @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch \n",
>         }
>      }
>      }
> }
>
>
> The Bio::PopGen::Utilities module can also take an alignment and
> extract the positions with variation for use in polymorphism analyses.
>
> -jason
>
> On Aug 9, 2005, at 8:34 PM, Andrew Leung wrote:
> > Hi all,
> > Is there any module available that can allow me to extract mutation(s)
> > automatically? The idea is that if I submit two sequences for
> > alignment, the
> > script can automatically list out all the differences between the two
> > sequences. I wish to know the difference at two levels, i.e. the
> > nucleotide
> > and amino acid level. Any ideas?
> > Andrew
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________