[Bioperl-l] fastq splitter

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 29 15:32:55 UTC 2012


On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote:
>
>> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ
>>> headers were written (and just when it seems there is some consensus, Illumina
>>> pulls the rug out from under you), hence the reason I leave it alone.  We could
>>> add some ID munging in there if needed, would just need a qr// with a standard
>>> fallback.
>>>
>>> chris
>>
>> Indeed - just like FASTA, it seems every company/tool/database has its own
>> conventions about the FASTQ ID line and how to stuff as much meta-data
>> into it as possible. This is a major reason why I hope unaligned reads in
>> SAM/BAM takes off - places like the Sanger and Broad use this in their
>> pipelines.
>>
>> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html
>>
>> Peter
>
> Unaligned BAM makes the most sense.  I've also been talking with the
> HDF5 folks here sporadically, they're still keen on promoting BioHDF
> (it is pretty fast), though that has cooled considerably.
>
> Anyone working directly with CRAM in their pipelines?
>
> chris

I understand that Sanger are looking at moving their pipelines from BAM to
CRAM later this year, but CRAM is still quite new and in flux.

Peter




More information about the Bioperl-l mailing list