[Bioperl-l] Next-gen modules

Wed Jun 17 13:54:45 UTC 2009

On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear Mark,
>
> thanks a lot for the pointers.
>
> With regards to FASTQ parsing:
>
> -my understanding by reading past threads is to work on a single format,
> i.e. FASTQ and to interpet the quality "flavours" as just quality
> conversions, right?
> -However, I assume we would still want to support a simple way for the user
> to say format => 'fastq-solexa' using the nomenclature adopted in BioPython
> suggested by Peter, right?

I think you will need a way for the user to say they have a Solexa, or
an Illumina 1.3+, or an original Sanger standard FASTQ file.

>From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I
assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format
and the variant
"flybase_chadoxml" format). Does BioPerl's SeqIO format system have any
concept of flavour that I am not aware of?

> -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code
> at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they
> would produce identical outputs, will be a good check.

Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful
guide (although it doesn't yet cope with the new Illumina 1.3+ variant),
but I don't trust it 100%. See e.g.
http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html
http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html

Peter