[Bioperl-l] Count or weight matrix in bioperl?
pedro fabre
pedro.fabre at gmail.com
Fri Feb 17 18:36:37 UTC 2006
>Torsten and all,
>
> I don't think this will work for me for it only generates
>statistics for a single sequence. What I need is a count matrix for
>each position for a number of DNA sequences. In other words, if I
>pass there 3 sequences to this function then it returns the count
>for each postion for each nucleotide.
>
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for postion
>1,2,3 for each A,C,T, or G like this:
>
>
> 1 2 3
> A 1 0 0
> C 1 1 2
> T 1 2 1
> G 0 0 0
>
> Any idea of this is already built somewhere in bioperl?
>
> Thank you.
>
>
Sam,
What about this?
I worked in something like that some time ago for SNP calculation
and it looks to me you are on the same way.
If you have a sequence like
A C G T C C A - T
C G G T A G T G C
C C C C C G T G C
C G C T C G T G C
Convert the sequence to numbers (0 for the first value, 1 for the
first modification (reading by columns), 2 for the second
modification and so on)
Deletions can be considered as another base if you like
After that:
0 0 0 0 0 0 0 0 0
1 1 0 0 1 1 1 1 1
1 0 1 1 0 1 1 1 1
1 1 1 0 0 1 1 1 1
Once we have the haplotype converted to numbers we have to generate the
snp type information for the haplotype.
SNP code = SUM ( value * multiplicity ^ position );>
where:
SUM is the sum of the values for the SNP
value is the SNP number code (0 [generally for the mayor allele],
1 [for the minor allele].
position is the position on the block.
For this example the code is:
0 0 0 0 0 0 0 0 0
1 1 0 0 1 1 1 1 1
1 0 1 1 0 1 1 1 1
1 1 1 0 0 1 1 1 1
------------------------------------------------------------------
14 10 12 4 2 14 14 14 14
14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3
12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3
....
Once we have the families classify. We will B<take> just the SNP's B<not
redundant>.
14 10 12 4 2
If you want to look into the code follow this link.
http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup
HTH
Pedro
> Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:>
>Say I have an array of nucleotide sequences of of length N. I want
>to calculate the count matrix (weight matrix). That is for each
>position 1..N, I want to know how many As, Cs ,Ts and Gs there are.
>Is the code to do this already written in bioperl to build this
>matrix if I pass it those strings?
>> Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely,
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list