[Bioperl-l] DNA Smith-Waterman code

Yee Man ymc@paxil.stanford.edu
Sat, 11 Jan 2003 10:34:43 -0800 (PST)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

---559023410-851401618-1042310083=:23885
Content-Type: TEXT/PLAIN; charset=US-ASCII


Hi folks,

	I wrote a module that does DNA Smith-Waterman with affine gap
penalty in the form of gap_cost = gap_penalty + extension_penalty * #gaps.
I am wondering if someone here would allow me to add the code to the main
tree. 

	It is currently implemented with half-hearted Gotoh's improvement.
Instead of storing the scoring matrix as an auxillary array and calculates
it twice, I store the whole thing so that I only need to calculate it
once. I also have a module that implements the whole Gotoh's improvement.
The current one is good with small alignments. I can add the true Gotoh's
improvement module as a fallback when we don't have the memory to run the
doubly faster version.

	The code works basically the same as pSW and it returns a
SimpleAlign object at the end. You can check align.pl in the
attached package for details. It shouldn't take me too long to incorporate
the code to the bioperl tree.

Regards,
Yee Man

PS Ewan: I read the comments in pSW.pm and it says it is going to use
linear space to store the directional matrix. Is that really true? I
remember when I read Ron Shamir's lecture notes, Hirschberg's divide and
conquer only works for Global Alignment and not Local Alignment that SW is
supposed to solve. Did I miss something? 

PPS What is the name of Phil Green's paper to optimize SW? 

---559023410-851401618-1042310083=:23885
Content-Type: APPLICATION/octet-stream; name="dsw.tar.gz"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.GSO.3.96.1030111103443.23885B@optima.stanford.edu>
Content-Description: 

H4sIAHxaID4AA+08a1fbSLLzFf+KHmCubbCNLfPYEwI3BBzwbgIEk2TuSVgf
YbeNwJY8khxgN+xv36rqh1qyZCATmLs77nMSrO7q6np1dXV1S93geuWnJy7V
6mp1Y2MN/larG+ur9Hd1TfyV5SdoqFWra2u1+jrUb9St2k9s7akJwzIOQttn
7KfbYec5hvv/Vrqgf/hXeUrmq6DZ9dXVDP1b9er6OuofbARsYGMV4a2NjZ9Y
9Qlp0uVPrv8Fx+0Mxl3O5tEKLuZzufB2xLu8x7g7HrJ/5hiUveZJe6+5s1+K
no4+HUZPJ839g9PosXn4cedtcy93x7qO3w43I5xB6I87IQuu2/bA6bvyUQzS
uQA9LLGA/1bbZCtL7PDD27flkPtDx7VD3sWGMXc7HJE4bp8trVA3xw3ZgLui
D/zohxfM60XQEizCbm2a/YynoR12LghN0PF8znqez2xRW2FAvT0ehMwJWN0c
eugEGf1kQ8nsWq6Zffv2iLrBX+aNuItcwR97EN5mD8hvQt0JfnM3cDxXdYsN
ViNKuN25IGADB9FJWEIvtAeSbpBaeMGZzwNEQAoacoA2B3e7UtBexw5x4OsL
Dl2xX9QBoAIAn9CBxGCIfGmJHWjJIfsgMt+5MTsssWOCsF1mj2+cwcD2b5nt
+/YtjGqH7IrzUcBC3+5cKQ6QWymQOKb3PwDTGATed8EepUWBiQtJAjj85h0S
i+Cjwhoo/Q4fDNiFjYo8d8JAjHYNBABZXQcEyaUYHbAJjwfMBcWCapgdhjbI
EVEP7RtnCBOSVFVhTdKwMxwNOIocyAH0NgtAhgOkYyjsAnQrGfRYYH+FuTOy
O9AdeIHJCdNwB5XWomkI0xQVEYJeO6w3domRAEFlXQIeeHdcJzSrCkImJab+
gtgT/xU3FbqvngNCtAedNoitMIE8AtRNR+NwNMZxUUf8HPQ0rVtcUX0epo9j
0kfuKkFh8B0d0do+ZfSIQwXT8CZl5Q2BfX7w+C4n3vXjO+16g0d0QsieDc5E
2gA05yYUF9W0W9I3BO29w53W0AkvPtno7G23YCwEJdNvlyIvXYr53pLypiXl
IYs5saZMmqwdAGHkEXqsgEOwrS1aaoq5uYiHeUUeOFEHJxZCsJEH6Ln/8xd3
vrhpIrHuQWJlIiEsMHW3JueS4F8wLtmMGCZmkVGJQs8jOyhuYoWo9nk49l1j
vmBz7i6XOaVzGVP6ybQBnKdMYuQHlpgCCsD5B/d6SZhiMVpEHEOjiFCqgsV0
sWu7roerGq1c4E/50AP3jytkAnVcM+VtYSQiLtGVGHFgZejDL1KVYREa4OUW
qyboiBsWLgqu55ZHXuCEDvhnEcCkkGAJEqwYCVaMBGuSBGsqCdYjSCDVwnAi
2ImqpdKxRUVCuhFXzy2KcnQVLmxbFMMY3FH0sQXBkQEHgcZklSWrcKFCQ3Ug
5viHiD72vdC7EItdoJZq7HWMBkarfxFoJ6PS+lmuFZekeaFDS4jvOH1SgyHl
0+3IIIEdG04C2woO0FHdZA4qRI2/yZaXHcBOo312zpC7TzB2iSlrwpb32RxY
0zl4/3s4eP8ADiyTg/cTHKSoSeg6HuVh5wPN5L16Ajkk+DzInvMZjKpwU7B6
8HBlwV6FmD0QzCq9SHcVqUX7LamWOU0rdYzIVX7xsSQDGoPquzRZ62gUosBJ
iUMrMpAIZzUrMeGb9gZeu85W2F/YMou4jGFJqodGeqSCJmk3HZJc1nAtj9Yy
DENy2fEkQKt1iBYNYL4WrSGX4pGery8ciKHBCF5Grv7bNwDRzxbZgY7H0N6d
ErtUejZ7Kh0bkRiBLy8bHeKoYx0wClP4l5e1ug2I2PhoBj2fyx1ZfJej/SIC
CBdnuBl0lKiizTjI+4QnEiApEYSO8aYE54YOJoN624vFAqp+WiwgYMxYoOON
3ZBmb0zTakmJK1ytKlGtWLFkZwrq2cAOaAMA9UZuYzNKKtgDt40BwGc9Y6LZ
cjYJZ2XARVPGe6zbNmRhzhJpxzU0VjALqtv35C67c7Gi126IAkY+73Hfhxns
feW+zkbYJBFzK29VCM+O250G9ACUhEb3qFVY4dALudgbB7EgKRiPIDjh3cGt
cGsgjnM74BqKMFE/AL4G5zjw3D6MCFUJwqX9K0mrPSHNnzLY1yX+p5JdxaJM
TKmiFU1GdiYNiGrKZUeqOgFsCeDlZQPcAvDLBHjCynD8COBO/+ID4Dud9JLO
yxHh5ENi5qxK10vwhWV5mcK0ubl7mdQgMdby5Xx8lDtlfmrR+3x5xspMBAcw
VpH9LFBDG46N7cXpMgHWHi4TQ5mUnfzBQiF+00UxRcuPEQq0kVxwjUiRBjGV
KY5zn9tXOjTAPyksfKlKncW5iLeJaXrtoJ8oKBrMmdHBqahs9kV8DnggidD2
QwzlneXaZnorRvWXyVaDg/goYAVTR9HbjfRxHjYKSXfaMFNZmSAhMYzM0cbx
90YQ3oW9QhB2wWuW2HzT/QqxXDcWCkHNmBtxnyr8xgkL5VosGqQwIBFA6tDb
drtT4kO9+qOuo/VfWJ8RJNy3sdF40J5VgK3rDMwHGm9OyVPuAOMrt2ywYou3
bgE7xi4qA6X3EWJyRpuIiSBV933cuot9DGVINNbvICEjkfRQEmKSSJ3pcpi0
toQ2RShl7HoUWtr4xN1JGZYgXBo0ehPGisFI+8Q/Kor3KJ4Ek8Va6cRlY4DJ
+r+Uz50wSudi+kwu77TicweiAh9CBaC/mqPpBaFGF2MNT+TOwfB9How8UYnI
ZFI9ZQY4AaIIeIjoXC+ssNMLjEocvwtRdH9MBxxOkEj333pjiDxcStoHnFOM
1DSp0ydo2okhfu1qRGySlrjOZSauGa1zSI34cynT0Pi/jLKxGkIn0EV9qeDE
tnHLl0XYweHhhQJ03C6/AVDsAPs7Xe/1eigO0fALNuQi5Skv8Zl6n7H/YYXq
TbX28qXoBavu9rbEQFoWSpY5dXEGgshRmtP1Etdh6CEaBKVxecD6zlfuCmXV
p6mqEtMLYnmMamiPmXki8ACNPFwlmTrJUArT8z+ukW9bCZVM7pkzzxUmGDK5
GNo38Y2W7/QvwjYsUVFV17t2EzWO3ccapgMdcAufIVY7A94DcyuretCaV2Nm
YITQKmyqJUHV4kAZtbLKOkauVuLb3pLgRb2SagaYbJOrDoZU2UBGSlYm8ZKC
gEGjXtsouGhMIUWjB7UkeSYZTfIc4/e9gHgEv0o709hNwhjc0oBbSR3DiLpP
Kq8RfNRBGUV6B9kqshvjUZfWQGgSEQ2GMjjFweezURjg0e3E2bUeCbttb0Vp
ZzOSNfLORnxnpJ4v47Uqbw04Y+sanqLyULqclOhqku0tQlLUZ40ZG9I0c0rt
OrH5SdFN9pjGTtKI0ORuRbKb7kJSzxmnOpEraQUUfFzJjQC7MnPMDGOQK0yL
JRNeV5m+LPXs8nGEOHEyaulUXAlfJSJuQtByhs7A9p3wVh+is3FAcQujc86y
OugEM+17AHgxFDmVplj8oKrvoNWM7BGHjQAsiyoYEjc6RNIGzb5cW6mLUVUu
p8IOuM9LBEf5dvuKLnbwXs/pONzt3JZYkw1h9+GMBrcyc8KH7PyW1QnlEDsY
Q8poK6jkolUw4+A6W74yEUwBI/oyND69TUZvXoxSuvKoKXJFsSZ9xmRIfI8P
OIn5E8dp8QRiX75aqVfEYM8k3pTbA1q8V0quhmBgy7l8taR9f2JWJA/nGeww
g1DHJBNbz18C2FKUFJRUhdpiMsD+R99am5UfVdT9z4snHGP6/c9qda1elfc/
12prqzVx/7M2u//5HGXB6bl4M7O91/rUPmjnFuDBcbl+zukLoi/BPThe5WLb
rLoNVvBuZzBZjf4Ha5N3P9tduv3peuJ8J3b3E/MLUTQvsm3Rc5QGisCtBLgV
A4enu+RJU9q9pAjm/ntJ91xuyy1g7qH3H+Mhcf6/g0Wp5wx45fjtk4xxz/yv
Vddqev7Xsb1Wm93/fqbCzAIBE2vchB9CZxC8eIFmgf/8eK6ZLbAW52zgnK8o
2BUNWhkNxfk9D21owNjowrvGtJjj9gbRcZ2BTGToIA5yxeYNn5VFRsd6ELAB
RCXe+RPUcgVbmDi/YfnDnXeNvPi9tc3yNMXzpRTAj42TVvPosP3m5OhdPoIF
dvIloBFcYjdgixIqpf/b5utWNNDnfP6sRMzxSr9SYvnyYJhP6bXXeNM8FATi
kPmSFInqtXew87HRbh29a5weNA/301A0D3fluOkomivjwF+RnnnFw6RXPjc3
x/JHr//a2D3VYwt2PfSEFS8pIvRrf7ShzsqTFPL/O4fNN43W6VONQf5/PTv+
q25Y2v8TXK26ur428//PUYy1P9dtfQKHl1OeT/64CXK0QaD/OxTODe3R3Mwh
/FcU2v+R3p9ujHvm/0Z9Vb//V63i+5+1an2jPpv/z1FGdufK7nMGNgCLPAaA
rx3vxYtTz8MQkDzAG7sTev7tZtTaohePqBE6vWq2dtgW++26kNUV4wcVPOGZ
c7VSreWh7nVjv3kot4Ack9JRMt6HfZjjcyYHwaq7KIvNFl8xM3XfdXhh/ouL
p3+7ZUzQgkfrwg6wj1tZjEdpJ8eMFKA8IAhYgUYoynvwITvn3IVgFaxiADgq
X1x2POA2pSHtrjxypEZ5aOmNuD8o4712Kcovbvx+BiXOoqsZmJobnzOXX0v6
h7essNgZ2EFQYq9svx8UQUSv2jIfDa2LAR/0oE4AlbdbH44bJy9eAIaCgI9A
C4vyFYzF6G2MRXoVYxHfw0AkiKy83QZufN92+7zwGRT3bud09wDCQiqS7nfN
Vlr1/s5xoqbx66mqKZ4pHiRNcjgipVAvsrE74EHARJKhyxaNXK8ClYRjrnES
PPaOgewB/KWh1rdwJBgIoJCCMnoXgY5HYoTRxQesXpQvOfyLrfz9S3d5cUXf
JosxKHtt5ubucnOUu45DhRe+d12Yf0fI5H0c8D9gc8xm7nh4jolnNMLPAtPZ
vMAlLCeFSCmOR9CppBv1vZdadU1UvQiZSbIEvI9quuimCKbbeJnkomoJ/j4i
940XNbPow9t095BGk0SRRu+qZJKG5rQo3sCaTloj+ZJuJoGALYVAmeUnpJvK
fYxsx792At6OjjoNb4KgJeyB72wt0qtB0qdoEHvgQpM31m+QOb0C+zmaFvTW
07dvUEU/y9tOYBfy5OCPfWdo+7ct/lszXwQY7eni3a2ou5Xd3fTiUm7Xtu/q
F7bw8ncat9fgz4F6Fox4x+nhnRBL3/UFp54ciXnnl7wTBkXTM0vJjjEBal6s
x/3+DfsrZhz883GfLcB+ZFULScpDvCZVKLKXzDKFIPmNNWcy2VFvpQFbBvnI
nHqXnbwVXWW2HkK7RI+nuuKaWNt2u+DtR54fFmT/BdYB9x9ytUBK2aj+yF3X
CUYD+7btdAt5rMlrx1mI2whotgtcFrWrReYTva2s3lbUW3QHqwRDpRVZRhDT
ErKSAPi/pNCp3yiDf+bJHeXvzBrpo2KV4BhizzAP83eaJrSyrYm4p7yN668J
VN62u1288CaM7y3e1bPPBxwYEMBlaMNcC3Ip7tKVxLrJypS7jtpE5lu34s0G
3YZZcN3iiAalCHWj8HfQZE2lycqkyUqhyVI0xfyYNw61GxMLS9JzLUIoqIMg
Pe+04WCgaM4ompRY+ZJVsUGY+BvnBj8a4YkUIg20tVWlF3EHA+8a2lZ2GrE0
V9xtizuXF/jSvPwIhFyxowlZNSfkXXKGa/PDqEvfj7mb8OkGZCQXteb+KNFs
TxGNHOt3SCcZIfTJvfgPlJKekg8RlAbWssII4hksCIb5fgmZ35F4uP2QX3qA
UAhOywPDlqny+CHigFG+XxwT3y15hFDIOT9AKARHQqltznJVmQXzPzKl92Rj
3Jf/ra8b+V9rA+E36rPzv2cpp/933HgHW/rJM3Eop2ZVLtc8PP5wmotX6vn5
1ab3iRfRmIqs0Pp6cvQ32Gb4/SLe9qqy/9Vt9BmMF0w/t742PxYKrY9LRej0
UfQpFnO5ow+n2eMFXyGsCX3ea4++UpcSmzfjRgE+X2IFvAG1VCQCMRSZ+YJY
wfmvEv5PNcZ989+q19X8B2dQp/lfq87m/3MUlf9VyVyVd23c4I4Rz/5Vzd6t
a7/17C7WGTlfBWi04zR71fj1+OjkVMBk5X/PPS8MQh+CIxpeH7Bvzhbt5yrR
/L8JnmqMe+Z/bQPvBon5v1bboPOfjbXZ+c+zlOj7j41fTxsnh/gJyKgOzzbi
Nb+2PryO18jvRsbuCQ6cc7r79+5o78PbhkrpsOOd3b/t7OvntC+E3Zv0edhn
sXJzxnerzAcrN6e/WyV/yt7iCTCIH4AlN7d7tNd4IZK6J43Tjztv8XWUh9wV
fOjHu+ZEkANjCPz3iSwW3eRygq8c5pIK3vmlfGs0HsdBvWZEMwGVIgWVRkKE
1RJYVdCVjly1ijHU08RQVrStUyPGYXM5lHtOJL++gx11X3SSIcKLibPvwCpu
nWbgFEmx76XVmkbr92AVV14ncE6b/+j/bRH/DZ7Kx6D/B9ee6f9Xa+vR95+t
Kvn/1dn972cpCz8zuiF47rgr6O1Z+dq4BoBHJ0dZB/+6lp4RbgGrjFZxqaDr
wCrxIYAo8wWraGNjn9/YIHpG1zxrZ+rxDT5aZ1/ceXVk8GrnZP8j7iHxVf9F
+TU+PD6PKCyUCQumnRH4c/WMstHlngfOmHLYecKeL0oU1r0oaoQiFcNiT9xq
kDiAyUJZnrtuszp2A5dWjr6Mt83KNVVLx50GFJ0xbouvpcmTBoW+vD157KWO
PPADsW3z3ENVAJ6F7ztFURji5ymJ2noJecE/SPGCeG9IvC/0S1f+M37D9ts4
7IgfbpjHGebxQ/y4wThgQAmJ9wZM5UnbK5iaGvXsYZ4sAEXcIxV8WWqd7oFj
LEZoytt4qxilC0IQJ6F/9Gx8/kL5vyf+0L55/5++9R//CyvAmor/rTVcJ2oA
/tzxvw970Wlw97X/h5ZtFq7ldnZO93f3d/6E5v+nLzT/1592jPvn/3o0/zco
/7++9tzx3593/q/ndnf2d3dPZw5gVmZlVmZlVmZlVmZlVmZlVmZlVmZlVmbl
v7D8G/xsdU0AeAAA
---559023410-851401618-1042310083=:23885--