[Biojava-dev] EnsemblApi use case for DNASequences

Peter biopython at maubp.freeserve.co.uk
Thu May 13 13:20:51 UTC 2010


On Thu, May 13, 2010 at 1:38 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>
>
> As you said at the end of your email the best way to accomplish this
> is by creating a SeqeunceProxyReader which can do all this logic
> and lets you work with the "right" objects and not have to re-implement
> that code. Now this leaves a few alternatives to how you can represent
> this in memory. We already have a 2bit implementation (will be called
> TwoBitSequenceReader) for storing very large pieces of Sequence
> but that only has support for ACGT and no support for gaps or Ns.
> This could be extended to bring in support for these as features or
> you could materialise that sequence and then push it into another
> Sequence object I have been working with (unchecked in atmo)
> which lets you join Sequences together. This combined with a
> Sequence which returns Compounds of a particular type e.g. Ns for
> any given length would let you represent massive amounts of
> Sequence in a very small amount of space. All of these updates
> will be in place soon but I cannot say exactly when

Does BioJava have a 4bit sequence implementation for ambiguous
DNA (or RNA)? That would let you treat N as 1111 (all four bits set)
and a gap as 0000 (none of the bits set).

Peter



More information about the biojava-dev mailing list