[Bioperl-l] Tricky pairwise sequence alignment for mtDNA
aaron.j.mackey at gsk.com
aaron.j.mackey at gsk.com
Tue Jun 13 12:19:11 UTC 2006
See Bio::LocatableSeq
-Aaron
bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
> Hello all,
>
> I am doing a project relating to some forensic analysis of mitochondrial
> DNA.
>
> I would like to write a script that will take a reference sequence, in
> this case the Anderson sequence which is the standard mitochondrial
> sequence which sample sequences are compared to, and compare it to an
> unknown sequence.
>
> I have been using this script:
>
> use Bio::SearchIO;
> use strict;
> my $fh;
> my @nomatches;
> open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p
> blastn |") || die $!;
>
> my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
>
> if( my $result = $parser->next_result ) {
> if( my $hit = $result->next_hit ) {
> if( my $hsp = $hit->next_hsp ) {
> my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> my ( @hitbases) = $hsp->hit_string;
> my ( @querybases) = $hsp->query_string;
> my $seq_string = join("", at querybases);
> my $seq_string1 = join("", at hitbases);
> for my $base ( @qmismatches ) {
> print "base $base of the hit sequence is a mismatch: ";
> print substr $seq_string, $base-1, 1;
> print "->";
> print substr $seq_string1, $base-1, 1;
> print "\n";
> }
>
> }
> }
> }
>
>
> The problem is, that some mitochondrial sequences from individuals have
> insertions, deletion etc, that cause them to be offset from the
> reference sequence, this then offsets the numbering system.
>
> To provide an example:
>
> >Anderson Reference Sequence|HV2
> ATTTGGT...
> 1234567
>
> >Sample|HV2....
> ATTTG|C|GT
> 12345,5.1,67
>
> The |C| denote an insertion, and traditionally in the forensics
community
> this would be called position 5.1G, but the program reads it as position
6.
>
> So basically I need to figure out how to modify a perl script in
> order to recognize
> that 5.1G is an insertion, and that it is not position 6, position 6
> is actually
> the G to the right of it, followed by position 7-T.
>
> Any ideas and suggestions would be greatly helpful, I know this
> could be very tricky,
> or very easy - I just have come to the point where the idea flow has
> stopped and would
> love to gather some outside input.
>
> Thanks
> Colin Erdman
> colin.erdman at du.edu
> Undergraduate Research Associate
> Institute For Forensic Genetic
> University of Denver
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list