[DAS] DAS Client
Thomas Down
td2@sanger.ac.uk
Mon, 6 Jan 2003 15:24:39 +0000
On Mon, Jan 06, 2003 at 03:49:22PM +0530, Saju Joseph wrote:
> Hi Gurus,
> I want to get the sequence for a chromosome region. Below is the piece of code I tried, and am struck in between. Can any of you help me. Other suggestions are welcome.
I noticed a few problems with your script:
- You're connecting to the `hg8' datasource at UCSC. This no longer
appears to exist -- it's not in the DSN list for that server,
and when I try accessing it from the command line with wget
(a great DAS debugging tool, by the way), I get a blank HTTP
response (not even a DAS error, which is arguably a problem
with the server). If I try a similar request to `hg13' it works
as expected.
- You set a content-type property for the request. This only
makes sense if you're POSTing your request rather than GETing.
And even if you *were* POSTing, the correct content type
would be application/x-www-form-urlencoded. In practice, this
probably isn't what's causing any problem, but it's definitely
wrong (and could, for example, cause problems with some servers
which accept alternative query formats).
- You try to read data twice from the InputStream returned by
the URLConnection. First, you read into a string, then
(having already reached the end of the stream), you pass it
to a BioJava parsing function. This would only be valid if you
did a mark/reset on the stream.
- At the end of the script, you use the BioJava EMBL parser.
DAS data is in a special DAS XML format. If the EMBL parser
were actually to receive any of this (it won't, see above),
it would return an error.
Since you're using BioJava, the simplest was to get the data
would be something like:
import java.net.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.*;
import org.biojava.bio.program.das.*;
// ...
SequenceDB das = new DASSequenceDB(
new URL("http://genome.cse.ucsc.edu/cgi-bin/das/hg13/")
);
Sequence dasSeq = das.getSequence("chr2");
System.out.println(dasSeq.subStr(100, 300));
If you want to see how BioJava is *actually* fetching sequence
data, see:
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/das/DASRawSymbolList.java?rev=1.6&cvsroot=biojava&content-type=text/vnd.viewcvs-markup
Thomas.