[Biojava-l] Re: More Questions on behavior of SymbolList

David Waring dwaring@u.washington.edu
Thu, 6 Sep 2001 12:52:39 -0700


>Thomas:
> But if you do decide that subLists /shouldn't/ reflect changes,
> I'd be slightly concerned about going down the always-copy
> route, since subList is a very common operation in some
> cases.  A better approach might be `copy-on-write'.  Have an
> implementation which starts off as a view on the parent sequence,
> but installs a ChangeListener, and takes a full copy if
> it receives a preChange notification?

I think that the copy on write (or really copy on edit) is a fine way to go,
if we decide that the subList should not reflect the changes. If, on the
other-hand, we decide a subList should reflect the changes we could simply
modify the start and end values as necessary. If someone wants a subList
that does not reflect changes than all they have to do is:
    new SimpleSymbolList(seq.subList(2,2999))
But reflecting the edits in a subList becomes tricky if, for example, the
last base in a subList is modified say by replacing it with two bases. Now
what is the end of the subList?

......
>> David
> > I want to implement a constructor SimpleSymbolList(Alphabet  alpha,
String
> > seqString) and this would be much faster using
> TokenParser.parseCharToken()
> > rather than parseToken (about 5 times faster). !!! Can we make
> this public?
> > !!!!!
>Thomas:
> Here's an alternative:  why not use the StreamParser interface
> here?  The reason that was added was to allow optimized code-paths
> for simple cases (like single-character tokens) without having
> to worry about details of specific SymbolParser implementations.
> So long as you pump reasonably large chunks of characters through
> the StreamParser, you should get performance which is very close
> to direct calls to parseCharToken().
>
> Does that make sense?

I am not sure what advantage this gives. If I understand this, I would use
the parseStream method of whichever parser I was given (presumably
TokenParser) I would have to implement a SeqIOListener within my
SimpleSymbolList, right? Otherwise how does the parser give me back my
symbols?

Now I have a String and I want to parse it into Symbols and put each one
into my array. I convert my String to a char[] instantiate my listener an
pass it off to the streamParser.  This would then parse each char, put it
into a Symbol[] then add it to my Symbol[].

Is this right? All of this so that someone could say:
	new SimpleSymbolList(nameParser,"SerHisIleThr");

Implementing SeqIOListener seems excessive. Am I missing something here?
I can see the flexiblity here, but it sure gets in the way of performance
sometimes.

I still suggest making parseCharToken public, and let people know that
SimpleSymbolList will only handle a String constructor with one char/token
Strings (after all it is not in the SymbolList interface). Another more
flexible option I see, changing the SymbolParser interface to
parseStream(SymbolListIOListener) and having a SymbolListIOListener that
requires one method addSymbols(Symbol[] s). SeqIOListener could extend this.
Then I would not have to implement a dozen empty methods.

.....


> > certainly makes reading symbolAt() slower (80-400% slower than
> > SimpleSymbolList).
>
> 400%?  Ouch!  I didn't realize.  What virtual machine are you using?
>

The 400% number comes from my tests on Win2000,jdk1.3. My tests on unix, an
Alpha, again with jdk1.3 the number is generally around 80% for reading the
last base in a 100,000 base sequence. The Win box is much faster overall
about 10 times faster, for example at reading from the SimpleSymbolList. But
the difference between the two tasks is greater. I suspect it may have to do
with the modulo since reading from a two dimensional array should not be
more than 2 times slower, but this is just a guess.

PC
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.0-C)
Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode)

Unix
java version "1.3.0"
Java(TM) 2 Runtime Environment, Standard Edition
Classic VM (build 1.3.0-1, native threads, jit)


David