[Biojava-l] Compress Sequences.
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Fri Aug 12 02:45:51 EDT 2005
Check out PackedSymbolList and the associated classes and interfaces
PackedSymbolListFactory, Packing, and Packing factory. These do bit
packing of
sequences. The nice part with these is they behave exactly like normal
SymbolLists so you don't even know your dealing with a compressed
sequence.
>From the java docs.
Example Usage
SymbolList symL = ...;
SymbolList packed = new PackedSymbolList(
PackingFactory.getPacking(symL.getAlphabet(), true),
symL
);
It is also relatively trivial to write a Huffman tree generator that can
compress SymbolLists as a binary string. You could use this as the bases
for full LZ compression. There are also very much more complicated
algorithms published that look for long range repeats, these are also very
slow.
- Mark
Felipe Albrecht <felipe.albrecht at gmail.com>
Sent by: biojava-l-bounces at portal.open-bio.org
08/12/2005 04:07 AM
To: biojava-l at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] Compress Sequences.
Has some class in biojava that compress sequences?
For example, put four nucleotides in a single byte.
If dont exist, someone knows a good algorithm for compress, read and
compare this sequence?
Thanks.
Felipe Albrecht
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list