[Biopython] general scripting help

David Shin davidsshin at lbl.gov
Thu Oct 10 22:36:57 UTC 2013


Hi -

I am trying to write a script to parse through 50 or so deltablast .xml
files.

File names are:
xaa.xml
xab.xml
xac.xml
...


I'm new (2 days) to python, biopython, and just trying to have something to
show for a meeting tomorrow. I have my script working well enough for one
file, I was wondering if there was a way to go thru each file separately
and output according to file name.

ie. I'm trying to replace "xaa" in lines 3 and 12 with a wildcard like x??
or even x*


# This part gets the length of the query and stores to a variable
from Bio import SeqIO
record = SeqIO.read("xaa", "fasta")
query_length = len(record)
#print "query length:", query_length

#This part gets the user's high and low percent identity cutoffs
high_percent_cutoff = float(input("Enter high percent cutoff: "))
low_percent_cutoff = float(input("Enter low percent cutoff: "))

# This part does the comparison to all the hits if
result_handle = open("xaa.xml")
from Bio.Blast import NCBIXML
blast_record = NCBIXML.read(result_handle)

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
         alignment_length = alignment.length
         identical_residues = hsp.identities
         percent_identity = float(identical_residues) / float(query_length)
         if alignment_length > query_length * 0.9 and alignment_length <
query_length * 1.1 and percent_identity > low_percent_cutoff and
percent_identity <= high_percent_cutoff:
              print "****Alignment****"
              print "sequence:", alignment.title
              print "query length:", query_length
              print "alighment length:", alignment.length
              print "identical residues:", identical_residues
              print "percent identity:", percent_identity
              print
              print
"12345678901234567890123456789012345678901234567890123456789012345678901234567890"
              print hsp.query[:80]
              print hsp.match[:80]
              print hsp.sbjct[:80]


Thanks for any help.
Dave



More information about the Biopython mailing list