[EMBOSS] Conservation of FASTQ scores by the EMBOSS tools.
charles-listes-emboss at plessy.org
Thu Sep 17 06:36:59 UTC 2009
Le Wed, Sep 16, 2009 at 10:12:16AM +0100, Mahmut Uludag a écrit :
> seqret returns quality scores if the input sequence format is explicitly
> defined on the command line, such as -sformat=fastq-sanger.
> The following patch looks like fixes the vectorstrip problem.
Le Wed, Sep 16, 2009 at 10:31:22AM +0100, Peter a écrit :
> You need to use "fastq-sanger" (or the other variants), since in
> EMBOSS, "fastq" currently means FASTQ ignoring the qualities.
Hi Mahmut and Peter, and thank you very much for your answers!
I would also like if the qualities were kept by default. I actually had tried
to force the fastq-sanger format before, but by adding its name to the USAs,
like in ‘seqret fastq-sanger::stdin fastq-sanger::stdout’. Unfortunately it did
not work; I do not know if it is by design or because of the dash in the format
name. Nevertheless -sformat=fastq-sanger and -osformat=fastq-sanger worked very
well after I applied Mahmut's patch.
I am tempted to apply it also to the Debian EMBOSS package, but maybe it is too
prematurate. In particular, I have the following warning each time the quality
is encoded by an equal sign:
Warning: Illegal character '='
Warning: Illegal pattern: =
By the way, I think I found a bug in revseq: it seems that it does not reverse
$ echo -e "@toto\nACTG\n+toto\n12/3" | seqret -filter -sformat=fastq-sanger -osformat=fastq-sanger
$ echo -e "@toto\nACTG\n+toto\n12/3" | revseq -filter -sformat=fastq-sanger -osformat=fastq-sanger
Also, in contrary to what the documentation predicts, using the fastq format
for the output does not ignore the quality scores. (Not that would be
particularly useful, but…)
$ echo -e "@toto\nACTG\n+toto\nACTG" | revseq -filter -sformat=fastq-sanger -osformat=fastq
Have a nice day,
Tsurumi, Kanagawa, Japan
More information about the EMBOSS