[Biopython-dev] [Biopython] when is a SeqRecord not a SeqRecord

Brad Chapman chapmanb at 50mail.com
Wed Jul 18 18:29:36 UTC 2012


Dilara;
Apologies, I missed that the second mail had updated code.

> This works as you pointed out because filtered_rec is explicitly defined.
>  Now if I want to do this
>
> from Bio import SeqIO
>     mod = (check_meanQ(rec, q_thresholdd) for rec in
> SeqIO.parse("hiseq_pe_test.fastq", "fastq"))
>     count = SeqIO.write(mod, "filtered_hiseq_pe_test.fastq", "fastq")
>     print "Modified %i records" %count
>
> It doesn't work because of some of the records are None.  So I tried doing
> this

The approach I'd take it to clean up check_meanQ to be explicit about
the return values:

> def check_meanQ(rec, q_threshold):
>     seqlen=len(rec)
>     quality_scores=array(rec.letter_annotations["phred_quality"])
>     if round(quality_scores.mean()) <= q_threshold:
>         print "Discarded ", rec.id, "because mean Q was",
> round(quality_scores.mean())
>         badrec = None
>     if round(quality_scores.mean()) > q_threshold:
>         goodrec = rec
>
>     return goodrec

def check_meanQ(rec, q_threshold):
    quality_scores=array(rec.letter_annotations["phred_quality"])
    if round(quality_scores.mean()) <= q_threshold:
        print "Discarded ", rec.id, "because mean Q was", round(quality_scores.mean())
        return None
    else:
        return rec

Then explicitly check for None values and remove them when writing:

>     count = SeqIO.write(mod, "filtered_hiseq_pe_test.fastq", "fastq")

count = SeqIO.write((x for x in mod if x is not None),
                    "filtered_hiseq_pe_test.fastq", "fastq")

Hope this helps,
Brad





More information about the Biopython-dev mailing list