[Bioperl-l] dynamically making a matrix

Michael Robeson popgen23 at mac.com
Tue Nov 9 16:05:04 EST 2004


Well, with your help and that of others I have been able to come up 
with this working code:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my(%gap, $animal);

$/ = '>';

while (<DATA>) {
   	next unless s/^\s*(.+)//;
     $animal = $1;
     while (/(-+)/g) {
       my $gap_length = length $1;
       my $position = pos() - $gap_length +1;
       push @{$gap{$animal}{$gap_length}}, $position;
   }
}

print Dumper \%gap;

__DATA__
 >mouse
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAGCCA
TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAGGGGTGCCCC---TCCTC
AACTCTGGT-ACA-TTAACATACTACTTACTACTTAGCATACTCTTTACT
AGGGAGCGATTGGGGACCACTAATATCT----CACTAAGATATCATACTA

 >rat
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGGTTTCAAGAAAGAACCA
TTGCCTCCTGTGCTGTTTGAAG--GGAAAGGAAGGA-GCCCC---TCCTC
AAGTCCGGC-ACA-CTAACGTGCAACTTACTAATTAACATACTGTTTACT
AGGGAGGTATTGGGGGCCTCTAATATCC----CATTAAGATATCATACTA

 >Human
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG
CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC
GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC
GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG

 >chimp
GTTATAAAGTTTCTTTGAGACAGTAAAATTATGATTTCTTGAAAGAACTG
CT--CTCTTGTGCTGTGTGAGGCTGTGCCAGGGGGCCAGGCCAGGTTCCC
GCCTCTGGAGACAGTTCATACAGGGTCAGCGACTTATCAA----CTTATC
GGTGATAGAATGGAGACCCTGTACCCCAGAAACACCAGGGTATCGT-CAG


Well, I am trying to make a matrix of 1's (present) and 0's (absent) 
based on gaps in DNA sequence. I am still having trouble determining if 
my script above is the way to begin. I basically would like to take the 
gaps that I have stored in the hash of a hash of arrays and make a 
matrix that is output as follows (though I do not need that first row 
'gap size (pos):' printed, that was just to show you were the data are 
coming from / being compared):

gap size (pos):	1(89) 1(113) 1(117) 1(201) 2(55) 2(75)  3(95) 4(144) 
4(183)
chimp:		   0            0          0          1          1        0     
     0         1             0
human:		   0            0          0          1           1       0     
     0         1             0
rat:		   	   1            1          1          0           0       1   
       1         0             1
mouse:		   0            1          1          0           0       1     
     1         0             1

Any suggestions on were to go from here? I just realized that my HoHoA 
may be to cumbersome to output the data in the above fashion. Any 
recommendations?

-Mike



More information about the Bioperl-l mailing list