[Biopython-dev] Contribution -- NMR xpk files

Robert G. Bussell rgb2003 at med.cornell.edu
Tue Dec 16 22:59:34 EST 2003


Hello,

I would like to contribute some tools that I have developed to the
biopython project.  Among them are programs for analyzing NMR data as well
as modules suited for more general problems of handling resonance
assignment data.


The tool set that I would like to contribue deals with with structural NMR
resonance assignment data in read from a standard, easily generated file
format (xpk or peaklist files) recognized by nmrview, a freely distributed,
open source program with a large user base.  The code that I have written
facilitates the process of extracting scientifically relevant information
from the data. Of additional utility is the module upon which the main
program depends which can be used as a building block for constructing code
to deal with nmrview peaklists and other resonance assignment data read in
from some other file format.

I'd be thrilled to introduce these tools into the public domain through
biopython and to provide more, related tools as feedback warrants them and
as I understand more about where they fit into the biopython project.

Thanks to everybody who supports the biopython and other open source
projects and I look forward to learning from your comments, testing and
suggestions.

Sincerely,

Robert Bussell, Jr.
rgb2003 at med.cornell.edu

-------------- Program and instructions -------------

I would appreciate your feedback and testing.  Here are some
instructions for setting things up on your computer.  Please let me know if
you use these tools even if you don't have a specific comment.

NOTE: At the moment the program and modules do not depend on biopython in
any way.  It should only be necessary to have python installed to run them.


(1) Create or locate an existing directory for the modules I provide
    below.  Copy all three of the modules (readtools, writetools and
    xpktools) into that directory.

(2) Edit the line in predictnoe2.py that reads

	 sys.path=[sys.path,"/usr/people/robert/bin/python/lib"]

    so that it points your local directory that contains the modules
    listed in step 1.

(3) The first line of the program 

	#! /usr/bin/env python

    may have to be modified to point to the python interpreter on 
    your computer.

(4) Cut and paste the input peaklist (noed.xpk) from below (or use
    your own data).  You will need to specify this file location as
    an input parameter to predict


(5) Make predictnoe2.py executable and invoke it with the command:
 
    predictnoe2.py --inxpk noed.xpk --outxpk noex.xpk 
    --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15

    NOTE: This mode sets up an i->i+1 and i->i-1 prediction where
    the directly detected proton is attached to the N15 nitrogen and
    the 15N2 atom is attacted to the proton from which the coherence
    originates.
    
(6) Inspect the output in the --outxpk file (noex.xpk if following
    step 4 literally) and/or load it into nmrview and overlay it on	 a
    spectrum.
    
CAVEAT: Be careful in cutting and pasting the code, especially if
    you are unfamiliar with the xpk data format.  It could easily be
    scrambled if it acquires false line feeds.

-------------BEGIN PROGRAM: predicnoe2.py------------------
#! /usr/bin/env python
# predictnoe.py: A python script that predicts neighbor NOE locations
# Generates a peaklist of predicted i->i+n and i->i-n NOE crosspeaks from
#  peaklist of self peaks.
# **Input arguments:** 
#	--inxpk 	input peaklist 
#	--outxpk	output peaklist
#	--fromatom	atom in xpk head corresponding to i of i->i+n noe 
#	--relayatom	label for the relay atom 
#	--detectatom	label for detected atom
#	--increment	n, where noes are between i and i+n
#	Example input:
#         predictnoe2.py --inxpk dpeak.xpk --outxpk xpeak.xpk
#         --increment 1 --detectatom H1 --fromatom 15N2 --relayatom N15
#
# *******************
# Known input file assumptions: Header should be six lines long and contain

# the list of data labels as the last line
#
# ****TO DO LIST****
# Add automatic prediction of forward and reverse noes
# Test the endpoint predictions (are all possible crosspeaks predicted?)


# ***** LOAD MODULES *****
import types
import getopt
import string
import sys
sys.path=[sys.path,"/usr/people/robert/bin/python/lib"]
import xpktools

# ***** FUNCTION DEFINITIONS *****


def get_res(infn,match,dict,headerlen):

	n=0
	i=0
	line=infile.readline()
	while (line):	# Read past the header
		i=i+1
		if (i>=headerlen):
			res=string.split(string.split(line)[1],".")[0]
			if (res==match):
				n=n+1
				key=res+" "+str(n)
				dict[key]=line
		line=infile.readline()

def find_label_cols(labels, labelswanted):
	# Find the column number for to, from and relay atoms
	#	using the xpk header and user inputs fromlabel and tolabel
	# Input -- the label line from the xpk
	#	(like this for ex. H1.L  H1.P  H1.W...more stuff...int 
stat)
	# Return values: col number for fromlabel and tolabel

	#** LOCAL INITS **
	datamap={}

	fromlabel	=labelswanted[0] 
	relaylabel	=labelswanted[1]
	detectlabel	=labelswanted[2]
	
	labellist=string.splitfields(string.split(labels,"\012")[0])

	# Make a data map of the label and ppm values of atoms of interest
	for i in range(len(labellist)):
		if (fromlabel+".P"==labellist[i]):
			datamap["fromppm"]=i
		if (detectlabel+".P"==labellist[i]):
			datamap["detectppm"]=i
		if (relaylabel+".P"==labellist[i]):
			datamap["relayppm"]=i
		if (fromlabel+".L"==labellist[i]):
			datamap["fromassign"]=i
		if (detectlabel+".L"==labellist[i]):
			datamap["detectassign"]=i
		if (relaylabel+".L"==labellist[i]):
			datamap["relayassign"]=i

	return datamap

def get_ave_cs(list,col):
	# Get the average of the chemical shift

	sum=0
	n=0
	for element in list:
		sum=sum+string.atof(string.split(element)[col+1])
		n=n+1
	return sum/n

def read_xpk(infile,dict,headerlen):
	# Read xpk files into a dictionary of lists
	# The dictionary entries are indexed by the "to" atom of the noe
	#	(i.e. the detected amide proton for a nh-nh noe experiment)

	#
	# Each list contains lines from xpk file with a common first
	# dimension residue assinment.
	# The peaklist header is also returned but not included in the dict

	#
	# Special dictionary elements:
	#	"maxres" the maximum residue number
	#	"minres" the minimum residue number


	header=[]	# This will hold the header lines 
	i=0		# line counter
	maxres=-1	# maximum residue number
	minres=-1	# minimum residue number

	line=infile.readline()
	while (line):	# Read past the header
		if (i<headerlen):
			header.append(line)
		i=i+1
		if (i>=headerlen+1):
			res=string.split(string.split(line)[1],".")[0]
			
			# Check min and max and update values as necessary
			[maxres,minres]=update_min_max(res,minres,maxres)

			if dict.has_key(str(res)):
				# Append the additional data about this
residue
				#	to a list 
				templst=dict[str(res)]
				templst.append(line)
				dict[str(res)]=templst
			else:
				# This is a new residue, start a new list 
				dict[str(res)]=[line]  # Use [] for list
type
		line=infile.readline()

	# Add the max and min statistics to the dictionary
	dict["maxres"]=maxres
	dict["minres"]=minres

	return header


def predict_xpeaks(dict,fromres,tores,datamap,count):
	# Predict the position of the fromres->tores NOE crosspeak
	# Nomenclature:
	#	"fromres" --> "tores" == "i --> i+inc"

	# *LOCAL INITS*
	predict=[]

	fromreslist=dict[str(fromres)]	# Residue 1 data line
	toreslist=dict[str(tores)]	# Residue 2 data line

	
	#** Get averages of ppm values for coordinates of each residue**
	# Only the "relay" and "detect" atom data need to be calculated 
	#	since the from characteristics will be left in place
	avefromppm	= get_ave_cs(fromreslist,datamap["fromppm"])

	
	#** Change the hn and n assignments and chem shifts to other res**
	# Base the new line on the "to" residue data line, substituting the
from data
	# Also change the first element (line count)
	for line in toreslist:
		fromlabel	= str(fromres) + ".n"	# ABSTRACT THIS
		line	= 
xpktools.replace_entry(line,datamap["fromppm"]+2,avefromppm)
		line	=
xpktools.replace_entry(line,datamap["fromassign"]+2,fromlabel)
		line	= xpktools.replace_entry(line,1,count)
		predict.append(line)
	return predict

def parse_args():
       
opts=getopt.getopt(sys.argv[1:],'',['inxpk=','outxpk=','fromatom=','detecta
tom=','increment=','relayatom='])

	for elem in opts[0]:
		if (elem[0]=="--inxpk"):
			inxpk=elem[1]
		if (elem[0]=="--outxpk"):
			outxpk=elem[1]
		if (elem[0]=="--fromatom"):
			fromatom=elem[1]
		if (elem[0]=="--detectatom"):
			detectatom=elem[1]
		if (elem[0]=="--increment"):
			increment=elem[1]
		if (elem[0]=="--relayatom"):
			relayatom=elem[1]

	if (inxpk=='' or outxpk=='' or increment=='' or detectatom=='' or
fromatom=='' or relayatom==''):
		input_args_needed_error()
		exit(0)
		
	return string.atoi(increment), inxpk, outxpk, detectatom,
relayatom, fromatom

def input_args_needed_error():
	print "These input arguments needed for program execution:"
	print "--inxpk"
	print "--outxpk"
	print "--increment"
	print "--detectatom"
	print "--fromatom"
	print "--atom"


def input_args_warning(progname):
	print progname, "Error -- please check your input arguments."
	print progname, "Quitting."

def cols_not_found_warning(progname):
	print progname, "Error -- One or more data columns not found."
	print progname, "	  Try checking your from, to and relay
atoms."
	print progname, "Quitting."


def update_min_max(res,minres,maxres):
	# This function takes care of updating the values of maxres and
	#	minres so that they reflect the global max and min residue
	#	values in the peaklist

	res=string.atoi(res)
	if (res>0):
		if (minres<0):	# takes care of initialization where
minres=-1
			minres=res
		if (maxres<0):	# takes care of initial value maxres=-1
			maxres=res
		if (minres>res): # found a smaller min, replace
			minres=res
		if (maxres<res): # found a larger value than maxres,
replace
			maxres=res

	return maxres, minres # * * * * * * * * * * MAIN * * * * * * * * *
*

def file_open_warning(file):
	print "predictnoe.py: Error opening", file, " for writing."
	print "predictnoe.py: Could be a file permission problem."
	

# ***** INITS *****
headerlen=6				# Num lines in xpk header
dict={} 				# The input dictionary
datamap={}
PROGNAME="predictnoe.py:"		# The name of this program
fromcol=-1			#column number of from atom label
detectcol=-1			#column number of detected atom label
relaycol=-1			#column number of relay atom

# ***** READ INPUTS ******
try:
	[inc,infn,outfn,detectatom,relayatom,fromatom]=parse_args()# Parse
input
except NameError, e:
	input_args_warning(PROGNAME)
	sys.exit(0)
atomswanted=[fromatom,relayatom,detectatom]



# **Read the input file into a dictionary**
try:
	infile=open(infn)
except IOError, e:
	file_open_warning("input file")
	sys.exit(0)

header=read_xpk(infile,dict,headerlen)
infile.close()



# **Find the appropriate column numbers for accessing data in this xpk file

datamap=find_label_cols(header[headerlen-1],atomswanted)

MAXRES=dict["maxres"]
MINRES=dict["minres"]


#****** CALCULATE AND WRITE *****
try:
	outfile=open(outfn,'w') 		# Open the output file
except IOError, e:
	file_open_warning("output file")
	sys.exit()

xpktools.write_list(outfile,header)   # Write the header

# Predict the i->i+inc and i->i-inc noe positions if possible
# Write each one to the output file as they are calculated
count=0 	# A counter for number the output data lines
res=MINRES	# minimum should be the lowest i value
while (res<=MAXRES):
	if ( dict.has_key(str(res)) and dict.has_key(str(res+inc)) ):
	       
xpktools.write_list(outfile,predict_xpeaks(dict,res,res+inc,datamap,count))

	if ( dict.has_key(str(res)) and dict.has_key(str(res-inc)) ):
	       
xpktools.write_list(outfile,predict_xpeaks(dict,res,res-inc,datamap,count))

		count=count+1
	res=res+1

outfile.close()
------------- END PROGRAM: predictnoe2.py --------------

------------- BEGIN MODULE: xpktools.py -----------------

# xpktools.py: A python module containing function definitions and classes
#	       useful for manipulating data from nmrview .xpk peaklist
files.
#
# ********** INDEX of functions and classes **********
#
#	xpkentry: Handles xpk data one line at a time generating
#		  attributes that are often sought after such as
#		  the chemical shifts and assignments of the 1st
#		  three dimesions.
#		  Using this function to define a variable
#		  requires both the data line from the xpk file
#		  itself as well as the label line (seventh line)
#		  of the xpk header file.  To get that line use
#		  the function get_header_line(infile) in this
#		  module.

import string

class xpkentry:
    # Usage: xpkentry(xpkentry,xpkheadline) where xpkentry is the line
    #	     from an nmrview .xpk file and xpkheadline is the line from
    #	     the header file that gives the names of the entries
    #	     which is typcially the sixth line of the header (counting fm
1)
    # Variables are accessed by either their name in the header line as in
    #	self.field["H1.P] will return the H1.P entry for example.
    #	self.field["linenum"] returns the line number (1st field of line)
    #	self.d1l=first dimension atom label
    #	self.d1p=first dimension chemical shift
    #	    --------  same for dim up to 4  -------

    def __init__(self,entry,headline):
       self.field={}	# Holds all fields from input line in a dictionary
			# keyed to the label line in the xpk file
       datlist	= string.split(entry)
       headlist = string.split(headline)

       # Parse the entry into a field dictionary
       self.field["linenum"]=datlist[0]
       i=1
       while i<len(datlist):
	 self.field[str(headlist[i-1])]=datlist[i]
	 i=i+1

       # Assign the chem shifts in all dimensions to special variables
       self.d1l=datlist[1]
       self.d1p=datlist[2]

       if (len(datlist)>=9):
	 self.d2l=datlist[7]
	 self.d2p=datlist[8]

       if (len(datlist)>=15):
	 self.d3l=datlist[13]
	 self.d3p=datlist[14]

       # Assign the general peak properties to special variables
       self.stat = datlist[len(datlist)-1]
       self.int  = datlist[len(datlist)-2]
       self.vol  = datlist[len(datlist)-3]

def get_header_line(infile):
	i=1
	while (i<7):
	  line=infile.readline()
	  i=i+1
	return line 

def replace_entry(line,fieldn,newentry):
	# Replace an entry in a string by the field number
	# No padding is implemented currently.	Spacing will change if
	#  the original field entry and the new field entry are of
	#  different lengths.
	# This method depends on xpktools.find_start_entry

	start=find_start_entry(line,fieldn)
	leng=len(string.splitfields(line[start:])[0])
	newline=line[:start]+str(newentry)+line[(start+leng):]
	return newline

def write_list(outfile,list):
	for line in list:
		outfile.write(line)

def find_start_entry(line,n):
	# find the starting point character for the n'th entry in
	# a space delimited line.  n is counted starting with 1
	# The n=1 field by definition begins at the first character

	infield=0	# A flag that indicates that the counter is in a
field

	if (n==1):
		return 0	# Special case

	# Count the number of fields by counting spaces
	c=1
	leng=len(line)

	# Initialize variables according to whether the first character
	#  is a space or a character
	if (line[0]==" "):
		infield=0
		field=0
	else:
		infield=1
		field=1


	while (c<leng and field<n):
		if (infield):
			if (line[c]==" " and not (line[c-1]==" ")):
				infield=0
		else:
			if (not line[c]==" "):
				infield=1
				field=field+1

		c=c+1

	return c-1
------------- END MODULE: xpktools.py -------------------

-------------- BEGIN DATASET: noed.xpk -------------------------
label dataset sw sf 
H1 15N2 N15 
test.nv
{1571.86 } {1460.01 } {1460.00 }
{599.8230 } { 60.7860 } { 60.7860 }
 H1.L  H1.P  H1.W  H1.B  H1.E  H1.J  15N2.L  15N2.P  15N2.W  15N2.B  15N2.E
 15N2.J  N15.L	N15.P  N15.W  N15.B  N
15.E  N15.J  vol  int  stat 
0  3.hn   7.753   0.021   0.010   ++   0.000   3.n   118.104   0.344  
0.010	PP   0.000   3.n   118.117   0.344 
  0.010   PP   0.000  1.18200 1.18200 0
1  4.hn   7.675   0.030   0.010   ++   0.000   4.n   119.262   1.852  
0.010	E+   0.000   4.n   119.260   1.852 
  0.010   E+   0.000  0.99960 0.99960 0
2  5.hn   7.878   0.026   0.010   ++   0.000   5.n   117.892   0.449  
0.010	EE   0.000   5.n   117.894   0.449 
  0.010   EE   0.000  0.93720 0.93720 0
3  6.hn   7.978   0.010   0.010   ++   0.000   6.n   120.085   0.343  
0.010	PP   0.000   6.n   120.101   0.343 
  0.010   PP   0.000  0.57500 0.57500 0
4  7.hn   8.137   0.084   0.010   ++   0.000   7.n   107.673   0.346  
0.010	PP   0.000   7.n   107.641   0.346 
  0.010   PP   0.000  0.60770 0.60770 0
5  8.hn   8.140   0.026   0.010   ++   0.000   8.n   121.050   0.355  
0.010	PP   0.000   8.n   121.050   0.355 
  0.010   PP   0.000  0.56960 0.56960 0
6  9.hn   7.924   0.028   0.010   ++   0.000   9.n   113.323   0.319  
0.010	+P   0.000   9.n   113.322   0.319 
  0.010   +P   0.000  0.46630 0.46630 0
7  10.hn   7.663   0.021   0.010   ++	0.000	10.n   121.341	 0.324	
0.010	+E   0.000   10.n   121.476   0.3
24   0.010   +E   0.000  0.49840 0.49840 0
------------------ END DATASET: noed.xpk




More information about the Biopython-dev mailing list