[Biojava-l] BioJava on large sequences

David Huen smh1008@cus.cam.ac.uk
Fri, 21 Sep 2001 23:45:29 +0100 (BST)


On Fri, 21 Sep 2001, Cox, Greg wrote:

> Has anyone done work with BioJava on very large sequences (i.e. contigs)?
> The types of issues we're thinking about are keeping a sub-set of the
> sequence in memory, but ensuring that the indices of the bases are accurate.
> Has anyone dealt with this?
> 

I hope I've understood the question correctly but I think the BioJava DAS
client does this sort of thing.

Ragbag can be made to use SoftReferences to do likewise as it does in the
version I've been using with the Dazzle server.

In both cases, component assemblies are used to manage this.  In Ragbag,
components in the contigs are represented by proxy objects that lazy
instantiate the real sequence objects only when they are accessed and
SoftReferences are used to let them be garbage collected under memory
pressure.  The largest I've attempted to date is twenty something
megabases which is a Drosophila choromosome arm.

Regards,
David Huen