[Bioperl-l] Next-gen modules

Peter biopython at maubp.freeserve.co.uk
Sat Jun 20 08:46:31 UTC 2009


On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations?  Our version (I believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).
>> Internally we have three separate FASTQ parsers/writers although
>> they do share code.
>
> We could easily do the same if others agree.  Actually, if we specified that
> shorthand for a variant on a format would be designated as -format =>
> 'format-variant', I think we could easily hack SeqIO to deal with that by
> splitting on '-' and passing everything to the constructor as (-format =>
> 'format', -variant => 'variant').  Very little repeated code in this case,
> just an additional named parameter indicating the format variant (and the
> SeqIO class can do the type checking on that within the constructor).

Yes, when I started using names like "fastq-solexa" I did have in mind
"main-variant" naming convention, and potentially Biopython may one
day actually use this structure when allocating a Bio.SeqIO job to the
appropriate parser or writer.

For now, the Biopython list of formats is fairly short (and there are
relatively few of these sub-formats) so to keep things simple we just
have a flat mapping from the format name (e.g. "fasta", "fastq",
"fastq-solexa") to the parser/write code.

Peter




More information about the Bioperl-l mailing list