[Biopython] replace header
Dilara Ally
dilara.ally at gmail.com
Wed May 30 03:30:27 UTC 2012
Hi Guys,
I'm interested in replacing just one part of the header for every read in a 40Gb fastq file. Because the files are so huge I don't want to read the entire file into the memory just the single read and then rewrite to a new file. The problem as it stands is that I'm creating all new SeqRecord object, appending a list called newsolid. And then once that list is complete with all records, I write that list to a new file.
Preferably I'd like to write each new SeqRecord immediately to a file. Sorry if I've missed this lesson in the Biopython tutorial and cook book! Any help would be greatly appreciated!
Here is the code.
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
newsolid=[]
for seq_record in SeqIO.parse("solid_1.fastq", "fastq"):
print seq_record.id
original_header=seq_record.id
import re
subfind=r"(\w+)_(\w+)"
result=re.search(subfind, original_header)
print result.groups()
subheader="_1"
subreplace=r"\1_1"
new_header=re.sub(subfind, subreplace, original_header)
print new_header
newfastqrecord=SeqRecord(seq_record.seq, id=new_header, letter_annotations=seq_record.letter_annotations)
newsolid.append(newfastqrecord)
output="newsolid_1.fastq"
from Bio import SeqIO
SeqIO.write(newsolid, output, "fastq")
Cheers, Dilara
More information about the Biopython
mailing list