[Biojava-dev] EnsemblApi use case for DNASequences
Thomas Down
thomas.a.down at googlemail.com
Thu May 13 13:43:04 UTC 2010
On Thu, May 13, 2010 at 1:38 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> We already have a 2bit implementation (will be called TwoBitSequenceReader)
> for storing very large pieces of Sequence but that only has support for ACGT
> and no support for gaps or Ns.
If you haven't already, I'd recommend taking a look at how the UCSC .2bit
file format handles Ns. Quite elegant, and seems to cover most genomic use
cases very efficiently. I've got a BioJava (1.x, I'm afraid) SequenceDB
implementation that's backed by a .2bit file (in a MappedByteBuffer) if
you're curious.
Thomas.
More information about the biojava-dev
mailing list