[Biojava-dev] EnsemblApi use case for DNASequences

Thomas Down thomas.a.down at googlemail.com
Thu May 13 13:43:04 UTC 2010


On Thu, May 13, 2010 at 1:38 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

> We already have a 2bit implementation (will be called TwoBitSequenceReader)
> for storing very large pieces of Sequence but that only has support for ACGT
> and no support for gaps or Ns.


If you haven't already, I'd recommend taking a look at how the UCSC .2bit
file format handles Ns.  Quite elegant, and seems to cover most genomic use
cases very efficiently.  I've got a BioJava (1.x, I'm afraid) SequenceDB
implementation that's backed by a .2bit file (in a MappedByteBuffer) if
you're curious.

               Thomas.



More information about the biojava-dev mailing list