[BioPython] Informative content problem, SOLVED!!!
Iddo Friedberg
idoerg@burnham.org
Fri, 29 Nov 2002 08:46:27 -0800
Hi Sebastian,
Yep, you were absolutely right in your diagnosis: IC (as a measure of
positional evolutionary conservarion) is applicable to alignments.
Whereas LCC is used (as you have demonstrated) for finding complexity in
a sequence segment.
Yes, it might be a good idea to implement LCC as a method. Not of
Seq.Seq though... Jeff, what is the policy on these things? Method or
function? Where?
Best,
Iddo
Sebastian Bassi wrote:
> Hi,
>
> Now I know what's going on. The formulae used by biopython is ONLY for
> aligments (since it uses information from every sequence on the aligment).
> My formula is LCC (local content complexity), so I implemented here:
>
> Before submiting my code, I know it sucks, so it would be nice to have
> it as a module, like lcc(STRING, STARTPOSITION, ENDPOSITION)
>
> Now, my code:
>
>
> from Bio import Fasta
> import string
> import math
>
>
> parser=Fasta.RecordParser()
> entrada=open("C:\\bioinfo-adv\\blast\\data\\vector.nn","r")
> cur_record=1
> iterator=Fasta.Iterator(entrada,parser)
>
> while cur_record:
> cur_record=iterator.next()
> if cur_record is None:
> break
> tamseq=len(cur_record.sequence)
> print tamseq
>
> for ini in range(tamseq-18):
> fin=ini+18
> primer=cur_record.sequence[ini:fin]
>
> if string.count(primer,'A')==0:
> term_a=0
> else:
>
> term_a=(string.count(primer,'A')/float(len(primer)))*((math.log(string.count(primer,'A')/float(len(primer))))/math.log(2))
>
>
> if string.count(primer,'C')==0:
> term_c=0
> else:
>
> term_c=(string.count(primer,'C')/float(len(primer)))*((math.log(string.count(primer,'C')/float(len(primer))))/math.log(2))
>
>
> if string.count(primer,'T')==0:
> term_t=0
> else:
>
> term_t=(string.count(primer,'T')/float(len(primer)))*((math.log(string.count(primer,'T')/float(len(primer))))/math.log(2))
>
>
> if string.count(primer,'G')==0:
> term_g=0
> else:
>
> term_g=(string.count(primer,'G')/float(len(primer)))*((math.log(string.count(primer,'G')/float(len(primer))))/math.log(2))
>
>
> lcc=-(term_a+term_c+term_t+term_g)
> print lcc
> print 'Cambio crom'
> print ''
> print ''
>
> entrada.close()
>
>
--
Iddo Friedberg
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171