[Biojava-l] Performance of SymbolList.subStr() function

Schreiber, Mark mark.schreiber at agresearch.co.nz
Fri Sep 12 21:38:51 EDT 2003


Andy -
 
String manipulation is nortoriously slow in Java (due to the Security model which needs Strings to be immutable) so every manipulation reqiures a new String. If you are dealing with a small alphabet and are going to be treating each Symbol as a String then it may pay to create one of each String "a", "c" etc and intern them using the String.intern() method.
 
I'm not sure if it will imporve performance but you might be able to avoid using Strings and chars at all by using Edits and cross product alphabets.
 
see the alphabets and symbols section of http://www.biojava.org/docs/bj_in_anger/ and also http://www.biojava.org/docs/bj_in_anger/edit.htm
 
Hope this helps,
 
Mark

	-----Original Message----- 
	From: Andy Hammer [mailto:facemann at yahoo.com] 
	Sent: Sat 13/09/2003 12:05 p.m. 
	To: bio java 
	Cc: 
	Subject: [Biojava-l] Performance of SymbolList.subStr() function
	
	

	This blows my mind!
	In both blocks of code:  seq = si.nextSequence();
	//just a protein sequence
	
	In this block of code, my system would take hours and
	often crash with an out of memory error.
	
	     for(int i = 1; i <= seqLength; i++){
	        String seqString = seq.subStr(i,i);
	        char protein = seqString.charAt(0);
	        newSeq =
	newSeq.append(newCodon(protein));//add a 3 char string
	to the newSeq StringBuffer
	      }
	
	I altered the above block to:
	
	    String seqString = seq.seqString();
	    for(int i = 0; i < seqLength; i++){
	      char protein = seqString.charAt(i);
	      newSeq = newSeq.append(newCodon(protein));
	    }
	
	This program used to take 4 hours to complete.  With
	this simple change it is now done in 30 minutes!
	Obviously, the SymbolList.subStr() function was
	severly hampering my efficiency.  I guess it was poor
	use of the SymbolList.subStr() function.  My rational
	was that I didn't want to create an unneeded String
	simply to represent my already existing SymbolList.  I
	am just trying to learn from this to become a better
	programmer and would appreciate any comments on this.
	
	Thanks!
	
	__________________________________
	Do you Yahoo!?
	Yahoo! SiteBuilder - Free, easy-to-use web site design software
	http://sitebuilder.yahoo.com
	_______________________________________________
	Biojava-l mailing list  -  Biojava-l at biojava.org
	http://biojava.org/mailman/listinfo/biojava-l
	


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Biojava-l mailing list