[Biojava-dev] MappedDataStoreFactory bug

Thomas Down td2 at sanger.ac.uk
Wed Apr 23 12:23:17 EDT 2003


On Wed, Apr 23, 2003 at 04:32:49PM +1200, Schreiber, Mark wrote:
> Hi -
>  
> When trying to increase the wordLength argument of  MappedDataStoreFactory.buildDataStore() above 10 I get an index out of bounds exception
>  
> Exception in thread "main" java.lang.IndexOutOfBoundsException
>         at java.nio.Buffer.checkIndex(Buffer.java:438)
>         at java.nio.ByteBufferAsIntBufferB.get(ByteBufferAsIntBufferB.java:96)
>         at org.biojava.bio.program.ssaha.MappedDataStoreFactory.addCount(MappedDataStoreFactory.java:305)
>         at org.biojava.bio.program.ssaha.MappedDataStoreFactory.buildDataStore(MappedDataStoreFactory.java:143)
>         at ssaha.CreateDNAFastaHashTable.main(CreateDNAFastaHashTable.java:31)
>  
> I think this might be caused by an innapropriate value for "word" created by this statement on line 141
>  
> int word = PackingFactory.primeWord(seq, wordLength, packing);

Could you send me the script you're using?  A quick read through
PackingFactory looks okay to me. 

The one case I can see which will cause breakage is if you use a
Packing which returns an out-of-range value to indicate ambiguity
symbols (which is really useful if you want to simply exclude runs
of 'N's from your table).  The alternative implementation,
CompactedDataStoreFactory, *does* handle this usage (but
it's written in a rather different way and doesn't use the
PackingFactory methods).

Is this what you're trying?

Unless you need to index very large abouts of sequence, I'd
actually encourage the use of CompactedDataStoreFactory, since
it can save a lot of disk space without hurting performance too
much.

     Thomas.



More information about the biojava-dev mailing list