[Biopython] Read Groups for BWA

Peter Cock p.j.a.cock at googlemail.com
Mon Feb 18 09:29:58 UTC 2019


Biopython does not have a SAM/BAM parser (although I did
write the start of one a long time ago, it is such a moving target
that I never finished it).

I would using pysam which should include parsing the read
group as part of the SAM/BAM header support.

Peter

On Mon, Feb 18, 2019 at 8:23 AM Mic <mictadlo at gmail.com> wrote:
>
> Hi all,
> In order to determine the Read Groups from FASTQ files for BWA I used to do:
>
> #Get read group infomration:
> #Source: https://www.biostars.org/p/280837/#310132
> header=$(zcat $r1 | head -n 1)
> id=$(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
> sm=$(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")
> echo "Read Group @RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA"
> ...
> bwa mem \
> $2 $r1 $r2 \
> -t 12 \
> -R "$(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA")" | samblaster -r | samtools view -@ 12 -bSh -f 0x2 -F 2316 - | samtools fixmate - - | samtools sort -@ 12 - -o ${3}/${output}.sorted.dedup.bam
>
> I just wonder whether BIopython has a function to determine the Read Groups?
>
> Thank you in advance,
>
> Best wishes,
>
> Michal
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list