[Biojava-l] parse blast results

Howard Ungar howard_ungar@yahoo.com
Mon, 5 Nov 2001 06:44:18 -0800 (PST)


--0-59010410-1004971458=:43551
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

--- Jennifer Pan <jpan@incellico.com> wrote:
> Hello Howard and Hello all, 
> 
> I was trying to parse a ncbi-blast results file,
> using the BlastReport Howard provided. 
> For a hit sequence that has mutiple MSPs, I could only retrieve one 
> start and one end position of one MSP within this hit sequence.  
> --------------------
> for example, I would like to see: 
> Query NM_00000
> >hit1 [1, 300]
>  hit1 [400, 600]
>  hit1 [700, 1000]
> >hit2 [1, 700]
> >hit 3 [200, 500]
> 
> and I've gotten 
> Query NM_00000
> >hit1 [400, 600]
> >hit2 [1, 700]
> >hit 3 [200, 500]
> -----------------------------------------
> 
> Any hint and help here?
> 
> many thanks
> 
> -Jennifer 
> 
Per Jennifer's request I have revised my BlastHandler to support
multiple alignments per sequence/query.  I also changed it to support
the newly checked-in BlastSAXParser (thanks to Keith James) which now
supports the QueryID.  I am attaching the revised code.  See the CVS
archives for the latest parser (or send me an email and I can forward
the biojava.jar which I used).

Howard

__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com
--0-59010410-1004971458=:43551
Content-Type: application/x-zip-compressed; name="newParser.zip"
Content-Transfer-Encoding: base64
Content-Description: newParser.zip
Content-Disposition: attachment; filename="newParser.zip"

UEsDBBQAAAAIAGltXStFJi0S/gUAAF4PAAAQAAAAQmxhc3RSZXBvcnQuamF2
YZVX33PbNgx+lv4KnF8mpY2c9bH5cUvTds1dkqV1d9e7Lg+0RNtsJVEjqdju
mv99AEjJcux0ne8ay+QHEPgAfFRV1WjjQJt5tqrKzIpVdqFrJ2v3TtRFKc1x
rHYhn66vPkhRPLE7Of/0ZpXLxild7wVc1k3rJro1uez3v4h7kbVOldnB9prS
tDJ0M1Wad/A7a4yeG1Gx21elsO5KfZUYwK0wlsKLxwcHMRwA/hvHTTstVQ45
4iww+oNkr//EMeCHsfQ5gI/iq7QgasBUodA5CAuKwsa1AhryjgsujiLobX7D
ZVGBMHO0hJNcF/Js4oyq55/vTsb8E4xsjLRIryB2QM8Q2Qi3qEUlez+yYw96
HkHNKBppjDag87w1NsDH/B1Ss+Q3h3utCqiEqpPufI4qBbcwemkHbjHzyGNA
c1ngFOq2LI83y9cYOPTLcTTVupQYiyNATgbOtFhIHw/ROIYL4fIFLI1GB3Vb
TaXhVM28rTB3i2WEhSwbpOPvVlpnN8aYZwIJRZuVsp67BZyewlHabX//Hkf8
wJDPR3cZehClTUaH5HCUpgiBpzEESDlrAkzW1skq063DRlK1K+skPX5ya/Sn
FXP5kttyq31OugqeDR+zLBulxz88qc96AJEr5ZJfcbNbfxgye6OXWEQJIneY
EoimwbJzL+FxcTQeXxgpnGSMnn6RuUuw7EsJS1E7cBoa3bQlImICnxsj1lfK
4ly9Fk5MhZWXBf/Gcssl9PtMyzthF9eiAf1OOfr2mLDK6UQ0QTgREKIQgKMI
fhZpfRxHu0MK2mtJ8LcL4MOxqc7LErOfYe9QAgbupbGYuKXWonynZEk5TjF/
o2SBhATfmZWO+vhKfFv7SHsJAx3COIWkX0y7oBC6m5VXhW2dzODjQlmGoVSV
shrOOEX3Wufc/AHvkejBzEQuwbIRYH/gDFqoVS7LNeVaCedkwWg8NEMyraWx
FDx1cC9K/Is50xETdoKwN5VCK39GjgwhNHfImwqPluHbCTCYHKHywfsbbGEL
M6Mr77mrxXMw2OAdeoFUXOlclAzPQomRsW3X2DHhAUn285BsI9JN6cNKEros
tBWf97u6953d+IoJ1I+ZNLLOewr25GRpRzhUa8gxXhTfgvJ5ARiymM83cYdG
oF7ZdpN08ae+HcYEpCK0/PfV2smJkyT9FgZ321AQ/2hkDTNVSr4/sI8KEGUJ
pao7mnnTkh7WTuHeOo7e4pL35wy533y8VtO+38IBedXOiIsidPUQHLKxRCWN
T4J9BwprcXSMXycwUFtcePaMFTLanINIVlJ19wNxvDU6l743R/AsWHZq6sya
XNIU536I/GVqOXrceJxQEINHDCS9VzIJ5eJmSAg94D557DDtQnnALqDLKene
Lghzo91b3dbF5mJcMQfRnkxX2RzFBFPFyyBht/tQowvdlkX9C0prV/rREMsy
f9QHFUcP1CZYHdZllDFUExRrdPbcizkrA9Ai5Vs850byEm97VUeh2BfLB2nb
0rFQNr51OZZ90NOdj4f6lwFV4DN1TyVWOJ/29GhwL2DrmjU+4NIlqo8gxfkq
17b74VX8rZq3hvOBJY0lJYfeVNVWg1eFBTrHa8vIOBq6wLYIupDh8kTixZSp
sMfXxHKBRCdDk2wh7I1cIbLvanw5QrH3KaVbIWY1I0NVuoQIHZJM+/OxCRJV
dFA1Szp0ZtU3bAw460hKuwf08wiE1g+sKHBLBWAyTCiWqqGQNke5UltvUUyN
NgXJG/ZLP8zhkGN1dnSsDg/7bP8Xf2TwnxxGP0Ng9JPsRXuow1c+5U+iePrg
RanmNd2hg3R60/357JUrPBkVavQSRj3E57xzwOPEuzngUy+3WNi13VARnXeb
GxjZ9svpJg9iJrj3tjvRj1hge0dk8Z4M6BIOgrTPKNmxutVWObxTLVKOfKSe
js3wViSUA663jK9pk0wfER5GcMt2h0ainB34M4gL/plunzmkcG8hR3+5z5QW
W1FU2F4TJ4zzGR3iHtl660co3XjQXUj7IbRC9/3Efw68UKtZd5vgD854oOCx
d/MQ/wtQSwMEFAAAAAgAVHRdK2f+oInHBgAASxQAABEAAABCbGFzdEhhbmRs
ZXIuamF2YcVYS3PbNhA+S79io0NC+UE5PdZ2JraS1p5xEidKO+m0PUAkJCIG
CQYAbWs6/u/dBQiIkuk4yaH1ZEKQ2F18+8SuhqKslbbwmV2ztLFCpjuHw2H3
o1DppRaVnVnNWXkY9pReprelTA27TU+s1WLeWG56t2cnn17fZry2QlW9BAWX
NdcmfcUXrJH2jFW55HqNg2jnQjk4+ExrrZaalY73VDJjL8QVx1MumTaOb7Kz
M4QdOKng05sLmKrK8iqIBVswC7WqG8kQMTCQwuAZC+I4ExbOcwMLrUok5GCa
smR6BYZnBB/JwJ24L/FI4lCNrRubppGVaQ52VYuMSbkCLlCKhpxZNmfGcbAs
48agMHoRyKE0fOOB4TTiPKpf0GOq6pUWy8LC0wyXh/DTwcFzmLJyrkW+5GgD
K+YqX8FHnhWVkmq5cuwnUsIH4jPwgaPVrnneFYsudzBYYwvEt3/USPf9SIoX
M1EisFOtsivJi0aj9ZLpycex2594Qlq/bJm/CmbN+vIaY4BUfp4eeAmTYYaq
G2+A4D5+i85Eq20GC/wzHAL+TXbcg1zpN/ZNzTOxEBmUvJy3rmhpJu5Za3GN
kYAMpnjDalDoSXwebmxi9ItqCep9w/XqLSv54fZ5v/KKazymJ+IeOzkI93Lv
75w2iwXyq8vpKxTRvh1DxW82CJLxNjPLrrxUvwwsuH6A9mNZP0qKtQDEVDWV
7TVRIex5jtZpzRPtM8Xy4fONJB9lKucvomO5Ppq4LyjdWFZlPG35AvvLmmHK
Y9567wCrWhmt2wK/5rXGeK4ssy7FtrhnzfwzZhai8NwedDw8R0YMFjL2AlPR
0QYZCwxlPNxsurCZS3R6N0KTEEgB6xgwOgemqb2LBkGF40jhDXl3z2bvMCUQ
kwGEj8jAp0PJMa3ytA/HtRI5oAG1fS15iSxJ65Va/fbhfA/i24XC+kSR0fn2
3r0PB4OBk7yu6riHL2ZMSTYcDmYrY3mZYi3CSozRIKtk5OSjRiPY9WeRov2E
8ehIHr88zOSwjWgf3ZC4EDw+OARx1GJLl9xe8GppCzQxiN1dAuv06JP3l/UH
R84AIBFj3Bgdb27/zmRDW3T8HVlgnVNp3Zgi2VJhI1FTE5Ed0KZYQAJJVwLn
mGOpVd4RuORfGiZNMnLF5jwfjcdRG1sIk66rkIuhTZgjkY9aoIMTKZYVhQGw
sMKdEJ9fSIqLvu9CdUYJjpjg6VMfguEvceAWosrPK185RjN/m43G8OQY9p+P
Ydzq4arEV9ATTcCHZCGnxqE6E0fihARqsUjWDFi+Gimd1QYdMVR5WkkuEweT
SV94uFJFORFYI6Q7/4jWJGjRyONI79CtvbSGCEmHtQuyK5FQRqFbYh5DHOVs
Q47QsH/oyNxbcwSGYGEidBbei8whrr4nWmaXMQjWYdzWnKqhi/Hd4lIZYcU1
VpqeiLhH9IMB8v1O83y9Fvfxi6XClbAommRECYmrJbBUtldRYrz3PZ66Fon1
I24nfQz/r3+FneGa45Xd42B3FXms2Nj1OHeDIDo2cqv668zr/f8sKB68VByR
82qnPlMA+FihjY2j3zCbFVC6/33Suy9J2Hc75H0yMZkp2TBWP5mqk65V7qlJ
+rhjMJbGKcvzxPG3jifSu/ut27e0IQ92I9is/2gv4hoOrHivOJNwg5OUm5Au
p0C3K+wDEpcgOctd5aty/MCEpJcbtDQ3Ncu4ExAa7F9ExWTsnzev6U5sk1zn
ht7U70p5kGj0GuFwr3dPl0NcU0SuozYtKmbAKIx2fOaotXVqP9JRUL4mTzrZ
ysvarhJMxm4Gqzo08v195o97OCuwuc4sTm8JLf/8G7/suTHBBaxfSgfY1YfW
GzQObUwxyL3nOVriw23FWV1jOCXEudZlWxl0D46HFCm58LMENfPMpzRCcofH
eRuTJTAilZvA/Si0iO6r0IjmwXnEpz3+nQTZaMdMizqM7zipdCVFAZrbRlfO
OEA/VeCCLzEE4gTjZLUqpPD2NHAKC57XYEtFTQWh9tplrHpmYc5xWGmqPIU/
uEHzxxHGFKqRmCaFVjcOV/hZBgXSzxVP+nxMALtN3TpH6UjnUT8OojMP0GWY
e5Lj/dETkRDvh8mk4JoT+BsOBbvmhNuBdtooP6Xd8GdIJJW6cj5TOjBjNHfp
1KLjuj1XCqgvBxzDFdh2psXmKXaK8C13W9DQd2hex93dzbYqTsx+ENhOuFB/
+2vEpcJ4zn/2d8Za0DaeIIRKNFDlgLfoPowRq9A6FFNtzKKySosllSf6YnnH
GXidxxNifYju2B5oOmiCGnc0GKIFQ9h2fgHAJMR//wJQSwMEFAAAAAgAaG5d
K5Sfc/v0AAAAaQIAAA4AAABBbGlnbm1lbnQuamF2YXWRwU7DMAxAz63Uf7B2
SkHKPmDiwH0gUL8gdFZnSNKSOEUT4t9JO9alAXKIIjt5ebbJDL1jeFWjkoFJ
y5tdVVblEF40tdBq5T3ca+qsQcvwWZUQ1+BoVIzQsCPbwXtAd3pUBnd/pofe
E9OIPkvvyb7hYU+ewShuj/PpDix+JClRzz7zu7PTYiPy7+vJr+AjebnEIjDX
+5qAK+QPqEN+vtwVZ5hDDs7CmplyEsrY0wE88tOlXpE3IBFcYlEwbVCx3RbN
yTMa2QeWsVeWtRWbBplXrPhuA7ewhtX/qF0LvMr9LjAfVE5JBhZJD9PMcs4y
yBQxb99QSwMEFAAAAAgAMm5dK8yZUqDEAAAABwMAAAoAAABNYXRjaC5qYXZh
hZLBDoMgDEDPM/EfenSX/cCy+y47LH4Bc0RIjDCoJsuyfx9TQQFRwqmlj0eL
7B4Nr6BqiNZwI1gx+OQZmCUV7wlSKFHxtoZXR9W7RKLwvJUXcj3NOG4UD1lX
Oh0ZzXrBn6ApXidAEQCPf98DMq5PNgSX8LpvnnnQiVEvsCNHUexUCx4uwfDE
hCz8p4RaQlor98wdKYNMOCUJVunuZlVE01uIzUHjFs03rbfAx4a7nEBy7pz7
QbHi0L3wh+0KrnZwhWL2D1BLAQIUABQAAAAIAGltXStFJi0S/gUAAF4PAAAQ
AAAAAAAAAAEAIAC2gQAAAABCbGFzdFJlcG9ydC5qYXZhUEsBAhQAFAAAAAgA
VHRdK2f+oInHBgAASxQAABEAAAAAAAAAAQAgALaBLAYAAEJsYXN0SGFuZGxl
ci5qYXZhUEsBAhQAFAAAAAgAaG5dK5Sfc/v0AAAAaQIAAA4AAAAAAAAAAQAg
ALaBIg0AAEFsaWdubWVudC5qYXZhUEsBAhQAFAAAAAgAMm5dK8yZUqDEAAAA
BwMAAAoAAAAAAAAAAQAgALaBQg4AAE1hdGNoLmphdmFQSwUGAAAAAAQABADx
AAAALg8AAAAA

--0-59010410-1004971458=:43551--