[Biojava-l] Fastq benchmark

Scooter Willis HWillis at scripps.edu
Tue Jan 24 12:08:22 UTC 2012


You can try a FASTA version of the file to measure performance gain.

File file = new File("filename");
Boolean  lazySequenceLoad = true;

LinkedHashMap<String, DNASequence> sequences =
FastaReaderHelper.readFastaDNASequence(file,lazySequenceLoad);

This will go through and index the accession id and not load any sequence
data which means no memory allocation and speed. You can then reference
the DNASequence by name and when you need the sequence data it will use
the file index to load the sequence data from the file for that specific
sequence. The same approach can be applied to FASTQ files.

Scooter

On 1/24/12 3:37 AM, "Mic" <mictadlo at gmail.com> wrote:

>Hello,
>I have found the following benchmark (
>http://biostar.stackexchange.com/questions/10376/how-to-efficiently-parse-
>a-huge-fastq-file/11279#11279
>)
>and I just wonder whether it is possible to make Java example even faster?
>
>Thank you in advance.
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list