[Biojava-l] .sff support

Mon Feb 15 22:32:28 UTC 2010

Hi all,

I've been playing around with the sff file based on the file
format definition at NCBI.
I uploaded the output which includes the common header,
the read header and read data section for the first read
of that file.

http://home.arcor.de/cimbusch/output.txt
> I'm happy to answer questions on how the file format works
> (including the undocumented index block which I had to reverse
> engineer).
>   
Yes, I would like to know how that works.
index_magic_number:778921588 .mft
version:1.00
Couldn't find anything about ".mft" version 1.

At the moment I have two classes: sffParser and sffFile
My idea was that sffParser can hold one or multiple sff files. Each 
instance of
sffFile has a hashtable with the identifiers as keys and the 
filepointers are
stored as the values.

Now I would like to find a good representation of one single "read" 
object, which
shall be accessible with an identifier like EV5RTWS02JXUUH

At the moment I'm making use of the BigInteger class to store many 
variables
but thats probably a waste of memory.
The variables for the read object I'm thinking of:

Read Header Section:

read_header_length -> int
name_length -> int
number_of_bases -> int
clip_qual_left -> int
clip_qual_right -> int
clip_adapter_left -> int
clip_adapter_right -> int
name -> string

Read Data Section:

flowgram_values -> float[]
flow_index_per_base -> int[]
bases: -> string
quality_scores -> int[]

But I'm not very familiar with the existing data structures of BioJava, 
is there
maybe already something similar?

Cheers,
 Charles