[Bioperl-l] Bio::Variation::SeqDiff, Bio::Variation::VariantI

Heikki Lehvaslaiho heikki@ebi.ac.uk
Mon, 17 Jun 2002 18:19:41 +0100


Eckhard,

The reason I pointed you to LiveSeq modules was that the effect of length
mutations are not symmetrical. LiveSeq takes that into account. I should 
have been more explicit, sorry.

Example: if one out of three mutations is insertion which and the other
ones are point mutations, the nucleotide sequence of the resulting
mutated sequence is dependent on the order these mutations are added into
original sequence. If the insertion is nearer the start of the sequence than 
other mutations and applied first, your current code places the point 
mutations in wrong places.  Same happens if there are two indel mutations: 
the latter is not correctly placed. The mutated sequence is different every 
time the order of indel mutations is changed.


First of all you need to keep track of the offset introduced by all
indel mutations; and their locations to find out which offset to use for any 
other sequence location. That might solve the problem here since we are 
talking about known differences between just two sequences.

In more general terms, the order of mutations are applied can be important, 
especially if they are to any extent overlapping.


Yours,
		-Heikki

P.S. Do you think you could write tests for the code you develop and put 
them into t/SeqDiff.t file, please. In that way it is eaier to see what the 
code is supposed to do and what cases have been tested.

	-H

Eckhard Lehmann wrote:
> Heikki,
> 
> 
>>This beginning to look like the the problem we wrote Bio::LiveSeq for...
>>If you find one more complication in generating the mutated sequence, I
>>suggest you have look how to use those modueles to do what you want.
> 
> 
> I took a look at the Bio::LiveSeq modules from CVS.
> As I have seen, I can operate on whole genes with these modules (please 
> correct me if I'm wrong). This is good, if I need it ;-).
> If someone doesn't need to mutate whole genes including translations, exons 
> and so on, but only short fragments of a gene or a sequence (a task I have to 
> handle here) - the Bio::LiveSeq modules are perhaps not so good... I think, 
> for such an application Bio::Variation::SeqDiff is more convenient.
> 
> Besides because of the mentioned problem with the position offsets after 
> deletions/insertions I see no more reasons to change the _set_dnamut method 
> for now...
> 
> I've tried out my code and fixed a little error in handling variations before 
> and after the actual variation. The complete method as it works for me is at 
> the end of this mail.
> 
>    - Eckhard
> 
> --------------------------------------------------------------------------
> sub _set_dnamut {
>     my $self = shift;
> 
>     return undef unless $self->{'dna_ori'}  && $self->each_Variant;
> 
>     $self->{'dna_mut'} = $self->{'dna_ori'};
>     foreach ($self->each_Variant) {
>       next unless $_->isa('Bio::Variation::DNAMutation');
>       next unless $_->isMutation;
> 
>       my ($s, $la, $le);
> #lies the mutation less than 25 bases after the start of sequence?
>       if ($_->start < 25) {
>         $s = 0; $la = $_->start - 1;
>       } else {
>         $s = $_->start - 25; $la = 25;
>       }
> 
> #is the mutation an insertion, deletion or ins/del?
> #we calculate the difference of the allele sequences to find out that...
>       my $delta_seq=length($_->allele_mut->seq) - length($_->allele_ori->seq);
>       if ($delta_seq != 0) { #if it is an ins, del ins/del
>         foreach my $variant ($self->each_Variant) {
> #add $delta_seq to the start position if the variant
> #is behind this variant.
> #do this even if $delta_seq is negative...
>           next if $_->start >= $variant->start;
>           $variant->start($variant->start + $delta_seq);
>         }
>       }
> 
> #is the mutation an insertion?
>       $_->end($_->start) unless $_->allele_ori->seq;
> 
> #does the mutation end greater than 25 bases before the end of
> #sequence?
>       if (($_->end + 25) > length($self->{'dna_mut'})) {
>         $le = length($self->{'dna_mut'}) - $_->end;
>       } else {
>         $le = 25;
>       }
> 
>       $_->dnStreamSeq(substr($self->{'dna_mut'}, $s, $la));
>       $_->upStreamSeq(substr($self->{'dna_mut'}, $_->end, $le));
> 
>       my $s_ori = $_->dnStreamSeq . $_->allele_ori->seq . $_->upStreamSeq;
>       my $s_mut = $_->dnStreamSeq . $_->allele_mut->seq . $_->upStreamSeq;
> 
>       (my $str = $self->{'dna_mut'}) =~ s/$s_ori/$s_mut/;
>       $self->{'dna_mut'} = $str;
>     }
> }


-- 
______ _/      _/_____________________________________________________
       _/      _/                      http://www.ebi.ac.uk/mutations/
      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________