[Biojava-l] BioJava-X parsing of RichSequences

Wed May 3 17:54:23 UTC 2006

On Wednesday 03 May 2006 18:50, Francois Pepin wrote:
> How much of the file is generally being read before the guess is made?
> I'm thinking very little is needed, especially compared to how much
> memory Java usually takes.

Generally not much. Jmol uses 16384 bytes.

> It would not be very difficult to save that first part of the stream and
> then play it back once the guess is made.

See how Jmol does it:

http://svn.sourceforge.net/viewcvs.cgi/jmol/trunk/Jmol/src/org/jmol/adapter/smarter/Resolver.java?view=markup

> I kind of like the idea of using streams, in cases where you are not
> reading from a file. Having to write everything to a temporary file to
> satisfy the API isn't a very appealing solution, I think.
>
> I could code something up if people are interested.

An additional advantage is that you get .gz support in one go:

      BufferedInputStream bis = new BufferedInputStream((InputStream)t, 8192);
      InputStream is = bis;
      bis.mark(5);
      int countRead = 0;
      countRead = bis.read(abMagic, 0, 4);
      bis.reset();
      if (countRead == 4 &&
          abMagic[0] == (byte)0x1F && abMagic[1] == (byte)0x8B)
        is = new GZIPInputStream(bis);

where t is your InputStream, and is the stream to use after the gzip 
check/unzip. For the full working code, see again Jmol CVS:

http://svn.sourceforge.net/viewcvs.cgi/jmol/trunk/Jmol/src/org/jmol/viewer/FileManager.java?view=markup

Egon

-- 
e.willighagen at science.ru.nl
Cologne University Bioinformatics Center (CUBIC)
Blog: http://chem-bla-ics.blogspot.com/
GPG: 1024D/D6336BA6