[Biojava-dev] sff files

Mon Nov 8 14:11:15 UTC 2010

On Mon, Nov 8, 2010 at 1:24 PM, Charles Imbusch <charles at imbusch.net> wrote:
>
> Hi all,
>
> for a project I implemented a rudimentary support for sff files coming
> from 454 sequencing machines. I packed and uploaded the code to:
>
> http://imbusch.net/tmp/sffParser.tar
>
> It is capable of extracting read information if the read id is known.
> Certainly an iterator for the reads  and taking advantage of the mft index
> structur (thanks to Peter for information) is necessary.
>
> An example code to extract a sequence:
>
> String sfffile = "/home/charlie/sff/Harmigera/EU97XD416.sff";
> sffParser sffparser = new sffParser(sfffile);
> System.out.println("number of reads: " + sffparser.get_number_of_reads());
> Read read = sffparser.get_Read("EU97XD416JXTCU");
> System.out.println("sequence for read EU97XD416JXTCU");
> System.out.println(read.get_bases());
>
> I would like to extend and integrate the code into BioJava but I'm a bit
> unsure on how to proceed. Especially the Read class was a quick solution
> for me. Maybe there is already something existing to manage reads and
> their quality scores?
>
> Any feedback is welcome!
>
> Cheers,
>  Charles

Hi Charles,

I can't comment on the internals, but I would suggest looking at
BioJava's FASTQ support for how that deals with read sequences
with quality scores. One example usage you could test is SFF
to FASTQ (with and without applying the trimming points in the
SFF file).

Peter

P.S. It sounds like BioPerl is likely to get SFF support too:
http://lists.open-bio.org/pipermail/bioperl-l/2010-November/034223.html