[BioPython] alignment processing
cgw501 at york.ac.uk
cgw501 at york.ac.uk
Tue May 17 15:07:03 EDT 2005
Hi,
I have a file processing task I'm trying to do with biopython. I have to
take a bunch of clustal alignment files that cover one arm of a whole
chromosome, strip off the lowercase letters at the end of each sequence,
and produce a file containing all the stripped sequences together is fasta
format. This is what I have so far:
import Bio.Clustalw
from Bio.Alphabet import IUPAC
import string
from Bio.Seq import Seq
from Bio.SeqIO import FASTA
from Bio.SeqRecord import SeqRecord
from sys import *
import sys
inputs = sys.argv[1:-2]
output = open(sys.argv[-1], 'w')
for f in inputs:
align = Bio.Clustalw.parse_file(f, alphabet=IUPAC.ambiguous_dna)
lines = align.get_all_seqs()
strippedAlignRecord = []
for line in lines:
lineSeq = line.seq
lineString = lineSeq.tostring()
strippedSeq = lineString.rstrip('atcg-')
strippedSeqObj = Seq(strippedSeq, IUPAC.ambiguous_dna)
strippedRecObj = SeqRecord(strippedSeqObj, id = line.description)
out = FASTA.FastaWriter(output)
out.write(strippedRecObj)
When I run this from the command line I don't get any errors, but the
outfile is not created. I'm a bit flummoxed. Any ideas?
Thanks,
Chris
More information about the BioPython
mailing list