[GSoC] weekly report #2

Peter Cock p.j.a.cock at googlemail.com
Mon May 28 09:48:21 UTC 2012


On Mon, May 28, 2012 at 10:29 AM, Artem Tarasov
<lomereiter at googlemail.com>wrote:

> The blog mentions you think you found some issues with tags.bam
>> file - could you elaborate (directl email is fine), and tell me about any
>> future issues please?
>>
>
> They are very minor. Specification says (1.4) that 'QNAME' should be
>
[!-?A-~], that doesn't include space and '@' sign,
>

Fair point. I should fix that. The '@" was presumably excluded in
the v1.3 spec to avoid confusion with FASTQ files.


> and that (1.5)
>
printable characters in tags with 'A' type are [!-~], i.e. only space
>
is not allowed.
>
> BTW, I looked at your code which generated the file, it uses
>
range(32, 127) both for 'Z' and 'A' types of tags, even though
>
it's explicitly written in comments right above these lines where
>
space should be included, and where it shouldn't :)
>

Good point, that is a change in the specification I hadn't noticed.
Back in v1.2, both A and Z were just "printable character" and
"printable string", which to me includes the space. It was only
in v1.3 that this was made explicit with a regex, and space
ceased to be allowed in the A tag. I wonder if that was an
accident or deliberate?

You'll notice that samtools doesn't complain about these
deviations from the specification but it doesn't attempt any
validation. I'm not sure if Picard checks this.

Thanks,

Peter



More information about the GSoC mailing list