[Bioperl-l] fastq splitter

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 1 15:03:02 UTC 2012


On Thu, Mar 1, 2012 at 2:41 PM, Pablo marin-garcia
<harpactocrates at googlemail.com> wrote:
> On Wed, Feb 29, 2012 at 4:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> I understand that Sanger are looking at moving their pipelines from BAM to
>> CRAM later this year, but CRAM is still quite new and in flux.
>>
>
> my concern is that being CRAM based in delta compression (comparison
> against reference), I  am not sure how much compression it would
> achieve with unaligned bams.

This can be done with an appropriate dummy reference, for instance
from a mini-assembly of the unmapped reads.

> The other thing that CRAM does is to
> remove a lot of extra tags and metadata (even from the header
> reference info), and here the strong point of bam against FASTQ is the
> availability of structured metadata. CRAM is still in development in
> this area so we will see where they go.

Did you miss Ewan's reply about CRAM 0.7 which is due soon?
http://lists.open-bio.org/pipermail/bioperl-l/2012-March/036295.html

Might this be better continued on the cram-dev list
http://listserver.ebi.ac.uk/mailman/listinfo/cram-dev
or on this SEQanswers thread?
http://seqanswers.com/forums/showthread.php?t=18050

Peter




More information about the Bioperl-l mailing list