[Bioperl-l] fastq splitter

Pablo marin-garcia harpactocrates at googlemail.com
Thu Mar 1 14:41:47 UTC 2012


On Wed, Feb 29, 2012 at 4:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote:
>>
>>> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J
>>> <cjfields at illinois.edu> wrote:
>>>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ
>>>> headers were written (and just when it seems there is some consensus, Illumina
>>>> pulls the rug out from under you), hence the reason I leave it alone.  We could
>>>> add some ID munging in there if needed, would just need a qr// with a standard
>>>> fallback.
>>>>
>>>> chris
>>>
>>> Indeed - just like FASTA, it seems every company/tool/database has its own
>>> conventions about the FASTQ ID line and how to stuff as much meta-data
>>> into it as possible. This is a major reason why I hope unaligned reads in
>>> SAM/BAM takes off - places like the Sanger and Broad use this in their
>>> pipelines.
>>>
>>> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html
>>>
>>> Peter
>>
>> Unaligned BAM makes the most sense.  I've also been talking with the
>> HDF5 folks here sporadically, they're still keen on promoting BioHDF
>> (it is pretty fast), though that has cooled considerably.
>>
>> Anyone working directly with CRAM in their pipelines?
>>
>> chris
>
> I understand that Sanger are looking at moving their pipelines from BAM to
> CRAM later this year, but CRAM is still quite new and in flux.
>

my concern is that being CRAM based in delta compression (comparison
against reference), I  am not sure how much compression it would
achieve with unaligned bams. The other thing that CRAM does is to
remove a lot of extra tags and metadata (even from the header
reference info), and here the strong point of bam against FASTQ is the
availability of structured metadata. CRAM is still in development in
this area so we will see where they go.

> Peter
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



-- 
   - Pablo Marin-Garcia




More information about the Bioperl-l mailing list