[Biopython] general scripting help
João Rodrigues
anaryin at gmail.com
Thu Oct 10 22:54:26 UTC 2013
Hey David,
If you put all your input files in one directory, all you need is a for
loop and the os module (import os) and its listdir method. Then you can
save the output to a file instead of printing to screen.
import os
directory = '/home/dave/xmlfiles/'
for each_file in os.listdir(directory):
output_file = each_file + '_output'
fhandle = open(output_file, 'w')
record = SeqIO.read(each_file, "fasta")
query_length = len(record)
# Print to file, use string formatting to make life easier
fhandle.write("query length: {0}\n".format(query_length))
etc etc
You might want to have a look here: http://pythonforbiologists.com/
Cheers and good luck for tomorrow,
João
2013/10/11 David Shin <davidsshin at lbl.gov>
> Hi -
>
> I am trying to write a script to parse through 50 or so deltablast .xml
> files.
>
> File names are:
> xaa.xml
> xab.xml
> xac.xml
> ...
>
>
> I'm new (2 days) to python, biopython, and just trying to have something to
> show for a meeting tomorrow. I have my script working well enough for one
> file, I was wondering if there was a way to go thru each file separately
> and output according to file name.
>
> ie. I'm trying to replace "xaa" in lines 3 and 12 with a wildcard like x??
> or even x*
>
>
> # This part gets the length of the query and stores to a variable
> from Bio import SeqIO
> record = SeqIO.read("xaa", "fasta")
> query_length = len(record)
> #print "query length:", query_length
>
> #This part gets the user's high and low percent identity cutoffs
> high_percent_cutoff = float(input("Enter high percent cutoff: "))
> low_percent_cutoff = float(input("Enter low percent cutoff: "))
>
> # This part does the comparison to all the hits if
> result_handle = open("xaa.xml")
> from Bio.Blast import NCBIXML
> blast_record = NCBIXML.read(result_handle)
>
> for alignment in blast_record.alignments:
> for hsp in alignment.hsps:
> alignment_length = alignment.length
> identical_residues = hsp.identities
> percent_identity = float(identical_residues) / float(query_length)
> if alignment_length > query_length * 0.9 and alignment_length <
> query_length * 1.1 and percent_identity > low_percent_cutoff and
> percent_identity <= high_percent_cutoff:
> print "****Alignment****"
> print "sequence:", alignment.title
> print "query length:", query_length
> print "alighment length:", alignment.length
> print "identical residues:", identical_residues
> print "percent identity:", percent_identity
> print
> print
>
> "12345678901234567890123456789012345678901234567890123456789012345678901234567890"
> print hsp.query[:80]
> print hsp.match[:80]
> print hsp.sbjct[:80]
>
>
> Thanks for any help.
> Dave
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list