[Bioperl-l] Tricky pairwise sequence alignment for mtDNA

Tue Jun 13 12:19:11 UTC 2006

See Bio::LocatableSeq

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:

> Hello all,
> 
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA. 
> 
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
> 
> I have been using this script:
> 
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p 
> blastn |") || die $!;
> 
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> 
> if( my $result = $parser->next_result ) { 
>      if( my $hit = $result->next_hit ) { 
>      if( my $hsp = $hit->next_hsp ) { 
>          my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
>     my ( @hitbases) = $hsp->hit_string;
>     my ( @querybases) = $hsp->query_string;
>     my $seq_string = join("", at querybases);
>     my $seq_string1 = join("", at hitbases);
>          for my $base (  @qmismatches ) {
>             print "base $base of the hit sequence is a mismatch: ";
>        print substr $seq_string, $base-1, 1;
>        print "->";
>             print substr $seq_string1, $base-1, 1;
>             print "\n";
>         }
> 
>      }
>      }
> }
> 
> 
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
> 
> To provide an example:
> 
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
> 
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
> 
> The |C| denote an insertion, and traditionally in the forensics 
community
> this would be called position 5.1G, but the program reads it as position 
6.
> 
> So basically I need to figure out how to modify a perl script in 
> order to recognize 
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually 
> the G to the right of it, followed by position 7-T.
> 
> Any ideas and suggestions would be greatly helpful, I know this 
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would 
> love to gather some outside input.
> 
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>