[Biopython] FASTQ to qual+fasta

Iddo Friedberg idoerg at gmail.com
Sun Jan 16 17:35:35 EST 2011


On 01/16/2011 02:25 PM, Peter Cock wrote:
> On Sun, Jan 16, 2011 at 6:48 PM, Iddo Friedberg<idoerg at gmail.com>  wrote:
>> question regarding the use of SeqIO.convert: how do I convert a FASTQ file
>> to qual and fasta files? Currently it seems that I have to run SeqIO.convert
>> twice e.g.:
>>
>>   SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.qual","w"),"qual")
>>   SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.fasta","w"),"fasta")
>>
>> Or am I missing something?
>>
>> Thanks,
>>
>> ./I
> Hi Iddo,
>
> That is almost the simplest solution, yes. You can use filename directly:
>
> SeqIO.convert("infile.fastq", "fastq", "outfile.qual", "qual")
> SeqIO.convert("infile.fastq", "fastq", "outfile.fasta", "fasta")
>
> Is it a bit slow for you?
>

Well, although elegant, in this case I am running two loops, where one 
should suffice.

> Using SeqIO.convert(...) in this case does use optimised code for FASTQ
> to FASTA, but currently we don't have a similar fast FASTQ to QUAL
> function. See Bio/SeqIO/_convert.py if you want to know how this is
> implemented. I can see several tricks for FASTQ to QUAL which should
> work... do you fancy trying this yourself?

I wish I had the time.... :(

> Alternatively, you could try combining a single call to SeqIO.parse(...) to
> iterate over the records as SeqRecord objects with itertools.tee to split
> this iterator in two to give it to two copies of SeqIO.write(...) to write
> FASTA and QUAL. I don't know how well that would work with memory
> consumption, but it would make only a single pass though the FASTQ file.


That's actually what I ended up doing.

> If speed really matters here, first we should add FASTQ to QUAL
> to Bio/SeqIO/_convert.py and if that isn't enough, do a special case for
> FASTQ to FASTA and QUAL (to live in Bio.SeqIO.QualityIO I guess).
>
> Peter

I think a fastq to fasta & qual would be best. I'll look into the 
QualityIO module and see if my code can be massaged in there.

Thanks,

Iddo

-- 
Iddo Friedberg, Ph.D.
http://iddo-friedberg.org/contact.html



More information about the Biopython mailing list