[Bioperl-l] dynamically making a matrix
Michael Robeson
popgen23 at mac.com
Tue Nov 9 16:05:04 EST 2004
Well, with your help and that of others I have been able to come up
with this working code:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my(%gap, $animal);
$/ = '>';
while (<DATA>) {
next unless s/^\s*(.+)//;
$animal = $1;
while (/(-+)/g) {
my $gap_length = length $1;
my $position = pos() - $gap_length +1;
push @{$gap{$animal}{$gap_length}}, $position;
}
}
print Dumper \%gap;
__DATA__
>mouse
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAGCCA
TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAGGGGTGCCCC---TCCTC
AACTCTGGT-ACA-TTAACATACTACTTACTACTTAGCATACTCTTTACT
AGGGAGCGATTGGGGACCACTAATATCT----CACTAAGATATCATACTA
>rat
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAACCA
TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAAGGA-GCCCC---TCCTC
AAGTCCGGC-ACA-CTAACGTGCAACTTACTAATTAACATACTGTTTACT
AGGGAGGTATTGGGGGCCTCTAATATCC----CATTAAGATATCATACTA
>Human
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG
CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC
GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC
GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG
>chimp
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG
CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC
GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC
GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG
Well, I am trying to make a matrix of 1's (present) and 0's (absent)
based on gaps in DNA sequence. I am still having trouble determining if
my script above is the way to begin. I basically would like to take the
gaps that I have stored in the hash of a hash of arrays and make a
matrix that is output as follows (though I do not need that first row
'gap size (pos):' printed, that was just to show you were the data are
coming from / being compared):
gap size (pos): 1(89) 1(113) 1(117) 1(201) 2(55) 2(75) 3(95) 4(144)
4(183)
chimp: 0 0 0 1 1 0
0 1 0
human: 0 0 0 1 1 0
0 1 0
rat: 1 1 1 0 0 1
1 0 1
mouse: 0 1 1 0 0 1
1 0 1
Any suggestions on were to go from here? I just realized that my HoHoA
may be to cumbersome to output the data in the above fashion. Any
recommendations?
-Mike
More information about the Bioperl-l
mailing list