[Biojava-l] seeking comments on proposed changes
Scott Markel
smarkel@netgenics.com
Wed, 29 Nov 2000 18:29:21 -0800
We'd like to propose some changes and would like to get the group's
feedback.
* Location.empty.equals(Location.empty) evaluates to false. The
problem is that EmptyLocation returns Integer.MIN_VALUE from the
getMax() method and the LocationComparator determines the distance
between the max of two Locations using subtraction. In this case of
comparing Location.empty to itself the max values are both maximally
negative so subtracting does not result in 0. We'd like to change
EmptyLocation's equals() method.
* FastaFormat doesn't use Java-like facilities such as reading lines
as Strings from a BufferedReader. We tripped over this while
tracking down a bug regarding DOS formatted end-of-line characters
in a FASTA file. we have a fix to the DOS format bug that could be
checked in, but we're wondering if using BufferedReader's readLine()
method might be a safer approach that avoids that kind of problem.
* We also noticed that when FastaFormat processes a sequence file a
new String object is instantiated for each character in the sequence
so that it can be parsed and added to the SymbolList. We've noticed
a big performance hit for large sequences (100K - 10M bp).
We'd like to do one of the following.
- Add a method that mimics parseToken(), but takes a primitive char.
This new method might live in either SymbolParser or a derived
interface. Change the implementation of TokenParser's parse()
method to not use substring(), which causes more Strings to be
instantiated.
- Change FastaFormat to use the current interface but instantiate a
String per symbol in the alphabet and reuse them rather than
creating a String per sequence character.
Comments?
Scott
--
Scott Markel, Ph.D. NetGenics, Inc.
smarkel@netgenics.com 4350 Executive Drive
Tel: 858 455 5223 Suite 260
FAX: 858 455 1388 San Diego, CA 92121