[Biopython-dev] [Biopython] when is a SeqRecord not a SeqRecord

Peter Cock p.j.a.cock at googlemail.com
Thu Jul 19 09:17:17 UTC 2012


On Wed, Jul 18, 2012 at 7:29 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Dilara;
> Apologies, I missed that the second mail had updated code.
>
>> This works as you pointed out because filtered_rec is explicitly defined.
>>  Now if I want to do this
>>
>> from Bio import SeqIO
>>     mod = (check_meanQ(rec, q_thresholdd) for rec in
>> SeqIO.parse("hiseq_pe_test.fastq", "fastq"))
>>     count = SeqIO.write(mod, "filtered_hiseq_pe_test.fastq", "fastq")
>>     print "Modified %i records" %count
>>
>> It doesn't work because of some of the records are None.  So I tried doing
>> this
>
> The approach I'd take it to clean up check_meanQ to be explicit about
> the return values:
>
>> def check_meanQ(rec, q_threshold):
>>     seqlen=len(rec)
>>     quality_scores=array(rec.letter_annotations["phred_quality"])
>>     if round(quality_scores.mean()) <= q_threshold:
>>         print "Discarded ", rec.id, "because mean Q was",
>> round(quality_scores.mean())
>>         badrec = None
>>     if round(quality_scores.mean()) > q_threshold:
>>         goodrec = rec
>>
>>     return goodrec
>
> def check_meanQ(rec, q_threshold):
>     quality_scores=array(rec.letter_annotations["phred_quality"])
>     if round(quality_scores.mean()) <= q_threshold:
>         print "Discarded ", rec.id, "because mean Q was", round(quality_scores.mean())
>         return None
>     else:
>         return rec

That should work - although since you are not actually modifying
the record at all I'd suggest a check function returning a boolean
(True for False). Then you could use this in a generator expression
like this:

def check_meanQ(rec, q_threshold):
    quality_scores=array(rec.letter_annotations["phred_quality"])
    return round(quality_scores.mean()) > q_threshold

records = SeqIO.parse("hiseq_pe_test.fastq", "fastq"))
count = SeqIO.write((x for x in records if check_meanQ(x)),
                         "filtered_hiseq_pe_test.fastq", "fastq")

(Untested - there could be a typo in there)

Peter.

P.S. Since this isn't directly about new development work
on Biopython itself, the main mailing list would be more
appropriate for this kind of question in future. Thanks



More information about the Biopython-dev mailing list