[Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java

Andy Yates ayates at ebi.ac.uk
Tue Apr 10 10:03:40 UTC 2007


Okay a quick run of uncompress on the mac with the files in question 
does produce a file which is equivalent to the file produced by gzip but 
not to the one produced by UncompressInputStream.

The required md5sum for a pass should be (after a md5 digest):

9f0924237d20288793172091d61f85b8  uncompressed_by_gzip

But we get:

17447efd34a245e430f20bc8d9b28a7b  uncompressed_by_uncompressInputStream

Okay so looks like there is something "wrong". Seems like it drops 88 
bytes from the decompression.

Wonder what happens if we pass this file type through the 
GZIPInputStream from the JDK?

Andy Yates wrote:
> I don't think there are standard classes for this compression format in 
> the SDK. There are ones for GZIP & ZIP but not for LZW which this one is 
> dealing with. Also I'm not sure about using GZIP to unzip a file 
> compressed with LZW since GZIP uses DEFLATE.
> 
> We need to decompress the file using uncompress (which is missing from 
> my Linux box but is on the mac ... go figure) and then match that up to 
> the output from UncompressInputStream & see if they agree or not.
> 
> Andy
> 
> Richard Holland wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I have no idea what it is for. There are generic Java classes provided
>> with the SDK that do the same job. I think we should probably drop it.
>> Lets wait to see if anyone shouts first.
>>
>> mark.schreiber at novartis.com wrote:
>>> Does anyone maintain this class??
>>>
>>> More to the point, does anyone know what it is for??? If I look at the 
>>> Uses link in javadoc there are aparently none at the public or package 
>>> level. Additionally why does biojava need one, are there not java.io 
>>> classes that can handle compressed streams??
>>>
>>> Is there a good reason why we cannot just clean it out?
>>>
>>> - Mark
>>>
>>> Mark Schreiber
>>> Research Investigator (Bioinformatics)
>>>
>>> Novartis Institute for Tropical Diseases (NITD)
>>> 10 Biopolis Road
>>> #05-01 Chromos
>>> Singapore 138670
>>> www.nitd.novartis.com
>>>
>>> phone +65 6722 2973
>>> fax  +65 6722 2910
>>>
>>>
>>>
>>>
>>>
>>> Chris Dagdigian <dag at sonsorol.org>
>>> Sent by: biojava-dev-bounces at lists.open-bio.org
>>> 04/07/2007 09:52 AM
>>>
>>>  
>>>         To:     biojava-dev at biojava.org
>>>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>         Subject:        [Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>
>>>
>>>
>>> Passing on this email that came to me ...
>>>
>>> Regards,
>>> Chris Dagdigian
>>> OBF
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Miguel Duarte" <malduarte at gmail.com>
>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>> To: dag at sonsorol.org
>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>
>>>> Hi Chris,
>>>>
>>>>> From http://sourceforge.net/project/shownotes.php?
>>>>> release_id=314770&group_id=18598,
>>>> i've learned that you're maintaining the class
>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>> case please forward this mail to the maintainer.
>>>>
>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>> truncates a few bytes from the end of the stream. I've verified this
>>>> comparing the gzip/uncompress output for some files versus what
>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>
>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>> with the attached test case. How to verify the bug:
>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>> compare the results.
>>>>
>>>> Thanks,
>>>> Miguel Duarte
>>>
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>> [ Attachment ''BH_03834.MCR.Z'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_GZIP'' removed by Mark Schreiber ]
>>> [ Attachment ''UNCOMPRESSED_BY_UNCOMPRESSINPUTSTREAM'' removed by Mark 
>>> Schreiber ]
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFGG1Hz4C5LeMEKA/QRAvTuAJ9F1AClFCV4WwBNP170mbC2+6JVDgCfVB17
>> HoCuWrx5k2ONg/9oxIfVVPI=
>> =cGTy
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the biojava-dev mailing list