[Biojava-l] Dazzle server-client bottlenecks

Thomas Down td2@sanger.ac.uk
Mon, 23 Jul 2001 11:42:02 +0100

On Sat, Jul 21, 2001 at 03:46:09PM +0100, David Huen wrote:
> I've been looking into identifying Dazzle performance bottlenecks.
> I have excluded lookup times required to return sequence information from
> the datasource to Dazzle itself by inducing Ragbag to cache the seqeunces
> used in the test.
> Times have been measured in FeatureFetcher doGet and DASSequence
> getTrueSymbols calls.  The total time spent in the method and the time
> between initiating connect() and getting back data has also been
> For FeatureFetcher, a typical call takes between 200-500 ms. The reply
> time is 50-100 ms.
> For getTrueSymbols, the typical call takes 3-4 secs.  The reply time is
> typically 100 ms.


> Looks like I've wasted my time writing the gzip compression code. ;-).
> Wonder whether it'll be worth committing at all! Transmission time doesn't
> seem to be a significant part of the latencies in the DAS protocol.  With
> slower phone modems, the transmission times may be 10x worse but that's
> still  0.5-1.0 secs only disregarding decompression times.

No, I don't think this is a waste at all -- some people are
/very/ bandwidth constrained, and even those who aren't will
appreciate this.

> The dominant part of the latencies appear to in constructing data
> structures following receiving the data from the server.  These seem
> particularly significant in the case of getTrueSymbols but then the
> amounts of data returned from one of these calls is likely to be large.
> Presumably these times comprise the duration taken to parse the reply and
> construct the data structures.

I have a horrible feeling I know exactly what the problem is
here :(.  The DNA fetching code is still building a full DOM
tree, then walking over all the Text nodes in this to build
the BioJava symbolList (David, if you're still in a benchmark-y
mood you could probably test this quite easily).

Anyway, there's a very simple fix: extract the symbol data
directly from SAX (or StAX) events.  I'm pretty confident that
the limiting factor will then be the BioJava SymbolParsers
(which have already been optimized to some extent anyway).
This should have been done a while back, but I've kept
putting it back, probably because I hardly ever have the
symbols display turned on in DAS.  I'll have a go this afternoon,
once I've caught up with my mails from BOSC...

Then the results of your compression work should really start
to look impressive :-).