[Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores

Wed Mar 10 10:47:01 UTC 2010

Hi Sean,

By unambiguous assembly of reads I mean that one would not squash
bubbles or trim branches, but simply collapse fully overlapping
(embedded) reads by combining the q-scores, or raising the q-scores if
you want, and keeping branching graphs separate.

This unambiguous denovo assembly would discard depth information,
which is important if you are doing digital gene expression analysis,
but would produce a collapsed fastq set of sequences that would be
leaner for downstream processing.

I'll have a look at Mosaik. I tried samtools pileup, but it seems a
bit overcomplicated to have to map back the reads if what you want to
do is just have the assembled reads with fastq scores coming out of
the assembler in the first place. That's why I was thinking it would
be good to have this unambiguous or "dummy" fastq assembly output
could fit into a bioperl script or method.

Cheers

On Wed, Mar 10, 2010 at 10:31 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella <avilella at gmail.com> wrote:
>> Hi all,
>>
>> I would like to know if anyone knows of a script or method in bioperl
>> to do an unambiguous assembly of fastq sequences, combining the q-scores to
>> give assembled fastq sequences as the output.
>>
>> By unambiguous I mean something like what abyss would produce with this options:
>>
>> ABYSS -k$k -b0 -t0 -e0 -c0
>>
>> but giving assembled fastq sequences with combined q-scores as output
>> instead of simple
>> fasta assembled sequences.
>
> Hi, Albert.
>
> I'm not sure exactly what you want here, but have you looked at the
> Mosaik aligner?  Also, look at samtools pileup; you can probably
> produce something similar to what you want from it as well.
>
> I certainly might have misunderstood the problem, though.
>
> Sean
>