[Bioperl-l] Announcing Bio::SFF

Fields, Christopher J cjfields at illinois.edu
Tue Dec 20 22:40:31 UTC 2011



On Dec 20, 2011, at 9:26 AM, "Leon Timmermans" <l.m.timmermans at students.uu.nl<mailto:l.m.timmermans at students.uu.nl>> wrote:

On Mon, Dec 19, 2011 at 8:44 PM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
Kinda joining this a little late, but I think if there is a way to have a low-level parser/writer that generically parses the data into simple (possibly hash-tagged) data structures, that would be best.  Barring that, a very simple class for storing data.  We've found BioPerl objects/classes pretty heavy.

(for an example of this, see Heng Li's readfq parser on github, which has some stats for Fastq/fasta parsing).

Any way we can separate the parser from object instantiation would enable us to optimize the object/class layer and parser/writer layers separately, with the possible nice side effect of making the parser more broadly used.

For insn Sance, if someone wanted a faster parser, use the low level, otherwise use the higher level (possibly BioPerl-specific) API. Lincoln does this do a certain degree with Bio-samtools; I would go further and make the bp- and non-bp code in separate dists.

A good OO system can actually help make things faster. For example, I'm unpacking the flowspace and quality data lazily, which made scanning through an SFF file 2.5-3 times as fast while having marginal extra costs when you do need them.

Leon

Yep, thinking about using the same approach for the Fastq variants.

Chris

Sent from my ancient iPad b/c my laptop's borked




More information about the Bioperl-l mailing list