[Bioperl-l] BioPerl long-term, was Re: dependencies on perl version

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 6 22:43:13 UTC 2013


On Wed, Feb 6, 2013 at 10:11 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> I see no problem in stating any generic parsing and low-level interfaces
> are just as much a part of what BioPerl encompasses as the higher-level
> Bio::* classes themselves.  Steve and Jason were on to something with
> SearchIO; it's maybe not as performant as we would like, but it certainly
> is more flexible in terms of what can be done, b/c it separates out
> low-level parsing from object creation.  That's the general model we
> should look at.  There is a good reason Biopython is following this
> model with their SearchIO implementation (Peter C, are you reading this?)

Actually I don't think we did end up with that kind of separation in the
Biopython SearchIO - which is not so say it isn't an excellent model
to follow. Rather the Biopython SearchIO (like the BioPerl one) had
as the first goal a consistent object model across assorted file
formats.

The idea of a low level minimal overhead parsers (which are very
format specific), on which a heavier but consistent object model
can be built might be a good balance - the high level API has the
connivence, but if you give that up you can have more speed.
That's what I recommend with FASTQ and Biopython, e.g.
http://news.open-bio.org/news/2009/09/biopython-fast-fastq/

>
> I have started a wrapper around Heng's FASTQ/FASTA parsing
> code (kseq), it seems to work quite well (~20M FASTQ in 30 sec
> last I recall?).
>

I'd have to dig through my emails, but I think the BioRuby guys
looked at that too - as I recall while it was fast, the error handling
left something to be desired. Email me directly or on the BioRuby
list if you want to follow up on that.

Regards,

Peter



More information about the Bioperl-l mailing list