[BioPython]  Is there a limit on Fasta parser? (an a bug spotted
	on LCC function)
    Jeffrey Chang 
    jchang at jeffchang.com
       
    Sun Mar 30 20:47:31 EST 2003
    
    
  
Hi Sebastian,
Are you using the parser in Bio.Fasta, or from Bio.SeqRecord?  Either  
way, though, there should be no limits in the parser.  It should be  
limited only by memory.  It's troubling that you are getting different  
number of nucleotides.  Is this reproducible?
As for LCC, thanks for the bug report.  Who's working on this part?   
Iddo?  Which module is the LCC function in?
Jeff
On Wednesday, March 26, 2003, at 11:54  AM, Sebastian Bassi wrote:
> Hi,
>
> When I extract info using the fasta parser, I get up to 999950 (and  
> sometimes only 999932) nucleotides.
> I'm using Biopython 1.10 on Python 2.2.2 on Win2000.
> Is this something known?
>
> Regarding LCC function, I found a bug, I forgot to reset a list, so  
> each function call, the list resturned was bigger than previous  
> (because it include previos results). Here is correct code:
>
> def lcc_mult(seq,wsize,start,end):
>     """Return a vector called lccsal, the LCC, a complexity measure  
> from a sequence, called seq."""
>     l2=math.log(2)
>     tamseq=end-start
>     global compone
>     #print "compone"+str(len(compone))
>     global lccsal
>     #print "lccsal"+str(len(lccsal))
>     compone=[0]
>     lccsal=[0]
>     for i in range(wsize):
> compone.append(((i+1)/float(wsize))*((math.log((i+1)/float(wsize)))/ 
> l2))
>     window=seq[0:wsize]
>     cant_a=count(window,'A')
>     cant_c=count(window,'C')
>     cant_t=count(window,'T')
>     cant_g=count(window,'G')
>     term_a=compone[cant_a]
>     term_c=compone[cant_c]
>     term_t=compone[cant_t]
>     term_g=compone[cant_g]
>     lccsal[0]=(-(term_a+term_c+term_t+term_g))
>     tail=seq[0]
>     for x in range (tamseq-wsize):
>         window=seq[x+1:wsize+x+1]
>         if tail==window[-1]:
>             lccsal.append(lccsal[-1])
>             #break
>         elif tail=='A':
>             cant_a=cant_a-1
>             if window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_a=compone[cant_a]
>                 term_c=compone[cant_c]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_a=compone[cant_a]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_a=compone[cant_a]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='C':
>             cant_c=cant_c-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_c=compone[cant_c]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_c=compone[cant_c]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_c=compone[cant_c]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='T':
>             cant_t=cant_t-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_c=compone[cant_c]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_t=compone[cant_t]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='G':
>             cant_g=cant_g-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_c=compone[cant_c]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_t=compone[cant_t]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         tail=window[0]
>     return lccsal
>
>
> -- 
> Best regards,
>
> //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
> \=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
> //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
> \=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 -     \=//
>
>               Linux para todos: http://Linuxfacil.info
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
    
    
More information about the BioPython
mailing list