[BioPython] fasta retrieval
Brad Chapman
chapmanb at uga.edu
Fri Aug 8 18:29:43 EDT 2003
Hi Marty;
> I need to extract 702 sequences from a database of about 25000 fasta sequences.
> Does anyone have a script for that?
It should be easy enough to write one. With Biopython, and assuming
the 702 sequence titles you want are in a file each on there own
line, it would look something like:
from Bio import Fasta
input_handle = open("your_seqs.txt")
interest_titles = []
for line in input_handle.xreadlines():
interest_titles.append(line)
input_handle.close()
fasta_handle = open("fasta_database.fasta")
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(fasta_handle, parser)
output_handle = open("your_seqs.fasta", "w")
while 1:
rec = iterator.next()
if not rec:
break
if rec.title in interest_titles:
output_handle.write(str(rec) + "\n")
fasta_handle.close()
output_handle.close()
That's off the top of my head in a mail client so no promises that
it works exactly, but it should give the idea. Hopefully this helps.
Brad
More information about the BioPython
mailing list