[Biojava-l] BioJava on large sequences

Thomas Down td2@sanger.ac.uk
Sun, 23 Sep 2001 19:18:14 +0100


On Fri, Sep 21, 2001 at 11:45:29PM +0100, David Huen wrote:
> On Fri, 21 Sep 2001, Cox, Greg wrote:
> 
> > Has anyone done work with BioJava on very large sequences (i.e. contigs)?
> > The types of issues we're thinking about are keeping a sub-set of the
> > sequence in memory, but ensuring that the indices of the bases are accurate.
> > Has anyone dealt with this?
> > 
> 
> I hope I've understood the question correctly but I think the BioJava DAS
> client does this sort of thing.

The DAS code does indeed handle fetch-on-demand, and reasonably
intellingent caching.  If a sequence is an assembly of small
components (clones or whatever), fetching happens by clone.  Large
sequences (chromosomes where no assembly information is available)
get split into `tiles'.

Anyother example of this is the biojava-ensembl code.  Again,
this caches by raw contigs.  I've quite happily worked with
human chromosomes (some 200-ish megabases) using this code
(and of course the human DAS server uses it).

    Thomas.