[Bioperl-l] FASTQ, was Re: BioPerl long-term, was Re: dependencies on perl version
Fields, Christopher J
cjfields at illinois.edu
Thu Feb 7 17:01:07 UTC 2013
re: thread-safe perl, so-so at best from what I understand.
chris
On Feb 7, 2013, at 10:09 AM, Aaron Mackey <amackey at virginia.edu> wrote:
> e.g., a pull-based FASTQ parser that did nothing else at the top level but "chunk" the file into as-yet-unparsed four-line blobs could appear to work very fast, if the user code did nothing but count the number of entries:
>
> while (my $seq = $seqio->nextseq) { $ct++ };
>
> in other words, you defer *everything* except the minimal amount of parsing/logic required to detect object boundaries.
>
> This is, in fact, the exact opposite of the event-based SearchIO "push" parsers, which always perform the most parsing possible, despite the user never accessing most of the material.
>
> Lastly, with respect to performance, if the parsing/object building operation is not simply IO bound, then parallel parser/object-building CPU threads could be considered, which could then dynamically adapt to pre-parse attributes (e.g. quality scores) that the calling code was actually using. What's the state of thread-safe Perl these days?
>
> -Aaron
>
>
> On Thu, Feb 7, 2013 at 10:56 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> This will likely be the approach for more NGS-friendly Bio::Seq class. Calculation of the PHRED scores could also be deferred until needed.
>
> seqtk has some C-based methods that we could possibly take advantage of, but will have to look into it.
>
> chris
>
> On Feb 7, 2013, at 9:25 AM, Aaron Mackey <amackey at virginia.edu> wrote:
>
> > You might also want to consider a lazy/pull-based parser to defer parsing/object-building for pieces of the object that don't get used. This also usually provides some error tolerance.
> >
> > -Aaron
More information about the Bioperl-l
mailing list