[Bioperl-l] Bio::Variation::SeqDiff, Bio::Variation::VariantI
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Mon, 17 Jun 2002 18:19:41 +0100
Eckhard,
The reason I pointed you to LiveSeq modules was that the effect of length
mutations are not symmetrical. LiveSeq takes that into account. I should
have been more explicit, sorry.
Example: if one out of three mutations is insertion which and the other
ones are point mutations, the nucleotide sequence of the resulting
mutated sequence is dependent on the order these mutations are added into
original sequence. If the insertion is nearer the start of the sequence than
other mutations and applied first, your current code places the point
mutations in wrong places. Same happens if there are two indel mutations:
the latter is not correctly placed. The mutated sequence is different every
time the order of indel mutations is changed.
First of all you need to keep track of the offset introduced by all
indel mutations; and their locations to find out which offset to use for any
other sequence location. That might solve the problem here since we are
talking about known differences between just two sequences.
In more general terms, the order of mutations are applied can be important,
especially if they are to any extent overlapping.
Yours,
-Heikki
P.S. Do you think you could write tests for the code you develop and put
them into t/SeqDiff.t file, please. In that way it is eaier to see what the
code is supposed to do and what cases have been tested.
-H
Eckhard Lehmann wrote:
> Heikki,
>
>
>>This beginning to look like the the problem we wrote Bio::LiveSeq for...
>>If you find one more complication in generating the mutated sequence, I
>>suggest you have look how to use those modueles to do what you want.
>
>
> I took a look at the Bio::LiveSeq modules from CVS.
> As I have seen, I can operate on whole genes with these modules (please
> correct me if I'm wrong). This is good, if I need it ;-).
> If someone doesn't need to mutate whole genes including translations, exons
> and so on, but only short fragments of a gene or a sequence (a task I have to
> handle here) - the Bio::LiveSeq modules are perhaps not so good... I think,
> for such an application Bio::Variation::SeqDiff is more convenient.
>
> Besides because of the mentioned problem with the position offsets after
> deletions/insertions I see no more reasons to change the _set_dnamut method
> for now...
>
> I've tried out my code and fixed a little error in handling variations before
> and after the actual variation. The complete method as it works for me is at
> the end of this mail.
>
> - Eckhard
>
> --------------------------------------------------------------------------
> sub _set_dnamut {
> my $self = shift;
>
> return undef unless $self->{'dna_ori'} && $self->each_Variant;
>
> $self->{'dna_mut'} = $self->{'dna_ori'};
> foreach ($self->each_Variant) {
> next unless $_->isa('Bio::Variation::DNAMutation');
> next unless $_->isMutation;
>
> my ($s, $la, $le);
> #lies the mutation less than 25 bases after the start of sequence?
> if ($_->start < 25) {
> $s = 0; $la = $_->start - 1;
> } else {
> $s = $_->start - 25; $la = 25;
> }
>
> #is the mutation an insertion, deletion or ins/del?
> #we calculate the difference of the allele sequences to find out that...
> my $delta_seq=length($_->allele_mut->seq) - length($_->allele_ori->seq);
> if ($delta_seq != 0) { #if it is an ins, del ins/del
> foreach my $variant ($self->each_Variant) {
> #add $delta_seq to the start position if the variant
> #is behind this variant.
> #do this even if $delta_seq is negative...
> next if $_->start >= $variant->start;
> $variant->start($variant->start + $delta_seq);
> }
> }
>
> #is the mutation an insertion?
> $_->end($_->start) unless $_->allele_ori->seq;
>
> #does the mutation end greater than 25 bases before the end of
> #sequence?
> if (($_->end + 25) > length($self->{'dna_mut'})) {
> $le = length($self->{'dna_mut'}) - $_->end;
> } else {
> $le = 25;
> }
>
> $_->dnStreamSeq(substr($self->{'dna_mut'}, $s, $la));
> $_->upStreamSeq(substr($self->{'dna_mut'}, $_->end, $le));
>
> my $s_ori = $_->dnStreamSeq . $_->allele_ori->seq . $_->upStreamSeq;
> my $s_mut = $_->dnStreamSeq . $_->allele_mut->seq . $_->upStreamSeq;
>
> (my $str = $self->{'dna_mut'}) =~ s/$s_ori/$s_mut/;
> $self->{'dna_mut'} = $str;
> }
> }
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________