[Biojava-l] dazed&confused: problems with DAS queries against ensembl326

Frank.R.Visser@gsk.com Frank.R.Visser@gsk.com
Fri, 12 Apr 2002 15:03:43 +0100


--=_mixed 004D3E8A80256B99_=
Content-Type: multipart/alternative; boundary="=_alternative 004D3EB380256B99_="


--=_alternative 004D3EB380256B99_=
Content-Type: text/plain; charset="us-ascii"

Hai everybody,

I have been writing a small web based DAS client for some of our users. I 
have been building it on top of biojava. We are still on ensembl326 in 
house so everything here goes for the ensembl326 release. I have not run 
any tests against ensembl428 because the sanger servlet site on port 8080 
is blocked by our firewall (any changes on getting it on port 80?).

Everything seemed to work fine, but testing my results against the inhouse 
ensembl326 website gives me right results in one area of the genome and 
wrong results in others.

I have tried to locate the problem, and used the TestDASG class in biojava 
to locate where the problem occurs. Attached is my modified version of 
TestDASG.java. I am running it against a standard ensembl326 DAS reference 
server. An example:

running TestDASG against 1 1 50000 gives the same result as ensembl: it 
maps in clone AL589746, and the transcripts show in the same place. 
However, running against 1 92950487 92970487 gives the wrong result. According to ensembl this maps to AL049597. 
TestDASG says it maps to AL353627. Ensembl gives the coordinates of 
AL353627 at arround 93.81 Mb, a completely different location...

I have been running against several areas, and the problem recurs.  I use 
the standard biojava API's, I have looked at the raw XML result files, but 
I can not see where the problem occurs. I suspect there are problems with 
the orientation, but I am not really sure what is happening.

Can anybody give me some clues as to what I am doing wrong and how I can 
fix it? I'll probably spent some time this weekend to test against public 
servers to see if the problem occurs there too. 

Frank




--=_alternative 004D3EB380256B99_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2 face="Arial">Hai everybody,</font>
<br>
<br><font size=2 face="Arial">I have been writing a small web based DAS client for some of our users. I have been building it on top of biojava. We are still on ensembl326 in house so everything here goes for the ensembl326 release. I have not run any tests against ensembl428 because the sanger servlet site on port 8080 is blocked by our firewall (any changes on getting it on port 80?).</font>
<br>
<br><font size=2 face="Arial">Everything seemed to work fine, but testing my results against the inhouse ensembl326 website gives me right results in one area of the genome and wrong results in others.</font>
<br>
<br><font size=2 face="Arial">I have tried to locate the problem, and used the TestDASG class in biojava to locate where the problem occurs. Attached is my modified version of TestDASG.java. I am running it against a standard ensembl326 DAS reference server. An example:</font>
<br>
<br><font size=2 face="Arial">running TestDASG against 1 1 50000 gives the same result as ensembl: it maps in clone AL589746, and the transcripts show in the same place. However, running against 1 92950487 92970487 gives the wrong result. According to ensembl this maps to AL049597. TestDASG says it maps to AL353627. Ensembl gives the coordinates of AL353627 at arround 93.81 Mb, a completely different location...</font>
<br>
<br><font size=2 face="Arial">I have been running against several areas, and the problem recurs. &nbsp;I use the standard biojava API's, I have looked at the raw XML result files, but I can not see where the problem occurs. I suspect there are problems with the orientation, but I am not really sure what is happening.</font>
<br>
<br><font size=2 face="Arial">Can anybody give me some clues as to what I am doing wrong and how I can fix it? I'll probably spent some time this weekend to test against public servers to see if the problem occurs there too. </font>
<br>
<br><font size=2 face="Arial">Frank</font>
<br>
<br>
<br>
<br>
--=_alternative 004D3EB380256B99_=--
--=_mixed 004D3E8A80256B99_=
Content-Type: application/octet-stream; name="TestDASG.java"
Content-Disposition: attachment; filename="TestDASG.java"
Content-Transfer-Encoding: base64

aW1wb3J0IG9yZy5iaW9qYXZhLmJpby4qOwppbXBvcnQgb3JnLmJpb2phdmEuYmlvLnNlcS4qOwpp
bXBvcnQgb3JnLmJpb2phdmEuYmlvLnNlcS5kYi4qOwppbXBvcnQgb3JnLmJpb2phdmEuYmlvLnN5
bWJvbC4qOwppbXBvcnQgb3JnLmJpb2phdmEuYmlvLnByb2dyYW0uZGFzLio7CgppbXBvcnQgamF2
YS5uZXQuKjsKaW1wb3J0IGphdmEuaW8uKjsKaW1wb3J0IGphdmEudXRpbC4qOwoKcHVibGljIGNs
YXNzIFRlc3REQVNHIHsKICAgIHB1YmxpYyBzdGF0aWMgdm9pZCBtYWluKFN0cmluZ1tdIGFyZ3Mp
IHRocm93cyBFeGNlcHRpb24gewogICAgICAgIGlmIChhcmdzLmxlbmd0aCA8IDQpIHsKICAgICAg
ICAgICAgdGhyb3cgbmV3IEV4Y2VwdGlvbigiamF2YSBkYXMuVGVzdERBUyA8dXJsPiA8c2VxPiA8
bWluPiA8bWF4PiBbYW5uKl0iKTsKICAgICAgICB9CiAgICAgICAgU3RyaW5nIGRiVVJMU3RyaW5n
ID0gYXJnc1swXTsKICAgICAgICBTdHJpbmcgc2VxTmFtZSA9IGFyZ3NbMV07CiAgICAgICAgaW50
IG1pbiA9IEludGVnZXIucGFyc2VJbnQoYXJnc1syXSk7CiAgICAgICAgaW50IG1heCA9IEludGVn
ZXIucGFyc2VJbnQoYXJnc1szXSk7CiAgICAgICAgCiAgICAgICAgVVJMIGRiVVJMID0gbmV3IFVS
TChkYlVSTFN0cmluZyk7CiAgICAgICAgCiAgICAgICAgREFTU2VxdWVuY2VEQiBkYXNEQiA9IG5l
dyBEQVNTZXF1ZW5jZURCKGRiVVJMKTsKICAgICAgICAKICAgICAgICBEQVNTZXF1ZW5jZSBkYXNT
ZXEgPSAoREFTU2VxdWVuY2UpIGRhc0RCLmdldFNlcXVlbmNlKHNlcU5hbWUpOwogICAgICAgIGZv
cihpbnQgaSA9IDQ7IGkgPCBhcmdzLmxlbmd0aDsgaSsrKSB7CiAgICAgICAgICAgIGRhc1NlcS5h
ZGRBbm5vdGF0aW9uU291cmNlKG5ldyBVUkwoYXJnc1tpXSkpOwogICAgICAgIH0KICAgICAgICAv
LyBkYXNTZXEuYWRkQW5ub3RhdGlvblNvdXJjZShhbm5vVVJMKTsKICAgICAgICAvLyBkYXNTZXEu
YWRkQW5ub3RhdGlvblNvdXJjZShtaXNjVVJMKTsKICAgICAgICBTeXN0ZW0ub3V0LnByaW50bG4o
Ikxlbmd0aDogIiArIGRhc1NlcS5sZW5ndGgoKSk7CiAgICAgICAgU3lzdGVtLm91dC5wcmludGxu
KCIxc3QgMTAgYmFzZXM6ICIgKyBkYXNTZXEuc3ViU3RyKDEsIDEwKSk7CiAgICAgICAgCiAgICAg
ICAgcHJpbnRGZWF0dXJlcyhkYXNTZXEsIG5ldyBGZWF0dXJlRmlsdGVyLk92ZXJsYXBzTG9jYXRp
b24obmV3IFJhbmdlTG9jYXRpb24obWluLCBtYXgpKSwgU3lzdGVtLm91dCwgIiIpOwogICAgICAg
IC8vIHByaW50RmVhdHVyZXMoZGFzU2VxLCBTeXN0ZW0ub3V0LCAiIik7CiAgICB9CiAgICAKICAg
IHB1YmxpYyBzdGF0aWMgdm9pZCBwcmludEZlYXR1cmVzKEZlYXR1cmVIb2xkZXIgZmgsIAogICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgRmVhdHVyZUZpbHRlciBmZiwKICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFByaW50U3RyZWFtIHB3LAogICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgU3RyaW5nIHByZWZpeCkKICAgICAgICB0aHJvd3Mg
RXhjZXB0aW9uCiAgICB7CiAgICAgICAgCiAgICAgICAgZm9yIChJdGVyYXRvciBpID0gZmguZmls
dGVyKGZmLCB0cnVlKS5mZWF0dXJlcygpOyBpLmhhc05leHQoKTsgKSB7CiAgICAgICAgICAgIEZl
YXR1cmUgZiA9IChGZWF0dXJlKSBpLm5leHQoKTsKICAgICAgICAgICAgcHcucHJpbnQocHJlZml4
KTsKICAgICAgICAgICAgaWYgKGYgaW5zdGFuY2VvZiBDb21wb25lbnRGZWF0dXJlKSB7CiAgICAg
ICAgICAgICAgICBwdy5wcmludChmLmdldFR5cGUoKSArICIgOiAiICsgZi5nZXRTb3VyY2UoKSAr
ICI6IiArICgoQ29tcG9uZW50RmVhdHVyZSlmKS5nZXRTdHJhbmQoKSk7CiAgICAgICAgICAgIH0g
ZWxzZSBpZiAoZiBpbnN0YW5jZW9mIFN0cmFuZGVkRmVhdHVyZSkgewogICAgICAgICAgICAgICAg
cHcucHJpbnQoZi5nZXRUeXBlKCkgKyAiIDogIiArIGYuZ2V0U291cmNlKCkgKyAiOiIgKyAgKChT
dHJhbmRlZEZlYXR1cmUpZikuZ2V0U3RyYW5kKCkpOwogICAgICAgICAgICB9IGVsc2UgewogICAg
ICAgICAgICAgICAgcHcucHJpbnQoZi5nZXRUeXBlKCkgKyAiIDogIiArIGYuZ2V0U291cmNlKCkp
OwogICAgICAgICAgICB9CiAgICAgICAgICAgIAogICAgICAgICAgICAKICAgICAgICAgICAgCiAg
ICAgICAgICAgIHB3LnByaW50KCIgYXQgIik7CiAgICAgICAgICAgIHB3LnByaW50KGYuZ2V0TG9j
YXRpb24oKS50b1N0cmluZygpKTsKICAgICAgICAgICAgdHJ5IHsKICAgICAgICAgICAgICAgIFN0
cmluZyBpZD1udWxsOwogICAgICAgICAgICAgICAgaWYgKGYgaW5zdGFuY2VvZiBDb21wb25lbnRG
ZWF0dXJlKSB7CiAgICAgICAgICAgICAgICAgICAgaWQ9ImNvbXBvbmVudHMvIiArICgoQ29tcG9u
ZW50RmVhdHVyZSlmKS5nZXRDb21wb25lbnRTZXF1ZW5jZSgpLmdldE5hbWUoKTsKICAgICAgICAg
ICAgICAgIH0gZWxzZSB7CiAgICAgICAgICAgICAgICAgICAgaWQgPSAoU3RyaW5nKSBmLmdldEFu
bm90YXRpb24oKS5nZXRQcm9wZXJ0eShEQVNTZXF1ZW5jZS5QUk9QRVJUWV9GRUFUVVJFSUQpOwog
ICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgcHcucHJpbnQoIiAoIiArIGlkICsgJykn
KTsKICAgICAgICAgICAgfSBjYXRjaCAoTm9TdWNoRWxlbWVudEV4Y2VwdGlvbiBleCkgewogICAg
ICAgICAgICB9CiAgICAgICAgICAgIHB3LnByaW50bG4oKTsKICAgICAgICAgICAgcHJpbnRGZWF0
dXJlcyhmLCBmZiwgcHcsIHByZWZpeCArICIgICAgIik7CiAgICAgICAgfQogICAgfQp9Cg==
--=_mixed 004D3E8A80256B99_=--