[Bioperl-l] Tricky pairwise sequence alignment for mtDNA
Colin Erdman
colin.erdman at du.edu
Tue Jun 13 16:05:30 UTC 2006
I actually have found EMBOSS DiffSeq to work quite well for detecting
the insertions and SNPs in the "sample sequence" as compared to the
"reference sequence".
If I get this all figured out and integrated I will post a method, I
imagine this would prove useful to others as well.
Thanks all,
Colin
On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote:
> See Bio::LocatableSeq
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM:
>
> > Hello all,
> >
> > I am doing a project relating to some forensic analysis of mitochondrial
> > DNA.
> >
> > I would like to write a script that will take a reference sequence, in
> > this case the Anderson sequence which is the standard mitochondrial
> > sequence which sample sequences are compared to, and compare it to an
> > unknown sequence.
> >
> > I have been using this script:
> >
> > use Bio::SearchIO;
> > use strict;
> > my $fh;
> > my @nomatches;
> > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p
> > blastn |") || die $!;
> >
> > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh);
> >
> > if( my $result = $parser->next_result ) {
> > if( my $hit = $result->next_hit ) {
> > if( my $hsp = $hit->next_hsp ) {
> > my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch');
> > my ( @hitbases) = $hsp->hit_string;
> > my ( @querybases) = $hsp->query_string;
> > my $seq_string = join("", at querybases);
> > my $seq_string1 = join("", at hitbases);
> > for my $base ( @qmismatches ) {
> > print "base $base of the hit sequence is a mismatch: ";
> > print substr $seq_string, $base-1, 1;
> > print "->";
> > print substr $seq_string1, $base-1, 1;
> > print "\n";
> > }
> >
> > }
> > }
> > }
> >
> >
> > The problem is, that some mitochondrial sequences from individuals have
> > insertions, deletion etc, that cause them to be offset from the
> > reference sequence, this then offsets the numbering system.
> >
> > To provide an example:
> >
> > >Anderson Reference Sequence|HV2
> > ATTTGGT...
> > 1234567
> >
> > >Sample|HV2....
> > ATTTG|C|GT
> > 12345,5.1,67
> >
> > The |C| denote an insertion, and traditionally in the forensics
> community
> > this would be called position 5.1G, but the program reads it as position
> 6.
> >
> > So basically I need to figure out how to modify a perl script in
> > order to recognize
> > that 5.1G is an insertion, and that it is not position 6, position 6
> > is actually
> > the G to the right of it, followed by position 7-T.
> >
> > Any ideas and suggestions would be greatly helpful, I know this
> > could be very tricky,
> > or very easy - I just have come to the point where the idea flow has
> > stopped and would
> > love to gather some outside input.
> >
> > Thanks
> > Colin Erdman
> > colin.erdman at du.edu
> > Undergraduate Research Associate
> > Institute For Forensic Genetic
> > University of Denver
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
More information about the Bioperl-l
mailing list