[Bioperl-l] Next-gen modules

Chris Fields cjfields at illinois.edu
Thu Jul 23 22:58:01 UTC 2009


On Jul 23, 2009, at 6:31 AM, Peter Cock wrote:

> On Wed, Jul 8, 2009 at 5:24 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>
>> It would be nice to get some regression tests going for this to  
>> make sure it
>> does what we expect, so maybe some test data and expected results?
>>
>
> Regression tests for BioPerl's FASTQ support would of course
> be sensible. In terms of sample data and expected results...
>
> I've got some test files put together for Biopython, and I have
> been cross checking Biopython's FASTQ support against
> EMBOSS 6.1.0 which has turned up a few issues:
> http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html
>
> ------------------------------------------------------------------------------
>
> I'd like to get comparisons against BioPerl's new FASTQ support
> going too. To do this I'd need to know which (branch?) of BioPerl I
> should install, and I'd also like a trivial sample BioPerl script to  
> do
> piped FASTQ conversion. i.e. read a FASTQ file from stdin (say
> as "fastq-solexa"), and output it to stdout (say as "fastq" meaning
> the Sanger Standard FASTQ).

You would have to install svn (bioperl-live) if you want the  
refactored fastq.  That commit was within the last month.

> i.e. Something like this four line Biopython script would be perfect:
> http://biopython.org/wiki/Reading_from_unix_pipes

We use named parameters so it's a little more verbose.

use Bio::SeqIO;
my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger');
my $out = Bio::SeqIO->new(-format => 'fastq-solexa');
while (my $seq = $in->next_seq) { $out->write_seq($seq) }

Don't be surprised if there are still bugs lurking about, just let me  
know and I'll fix 'em.

> ------------------------------------------------------------------------------
>
> Peter Rice and I have also been talking about line wrapping when
> writing FASTQ output, and if this is a good idea or not:
> http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000593.html
>
> Thanks!
>
> Peter C. (@Biopython)

BTW, I think the bioperl parser does handle line-wrapped FASTQ now.

Anyway, I tend to agree with Aaron on that point.  Too many exceptions  
to the rule make it harder to write parsers for human-readable format.

chris




More information about the Bioperl-l mailing list