[DAS] DAS Client

Thomas Down td2@sanger.ac.uk
Mon, 6 Jan 2003 15:24:39 +0000


On Mon, Jan 06, 2003 at 03:49:22PM +0530, Saju Joseph wrote:
> Hi Gurus, 
> I want to get the sequence for a chromosome region. Below is the piece of code I tried, and am struck in between. Can any of you help me. Other suggestions are welcome.


I noticed a few problems with your script:

  - You're connecting to the `hg8' datasource at UCSC.  This no longer
    appears to exist -- it's not in the DSN list for that server,
    and when I try accessing it from the command line with wget
    (a great DAS debugging tool, by the way), I get a blank HTTP 
    response (not even a DAS error, which is arguably a problem
    with the server).  If I try a similar request to `hg13' it works
    as expected.

  - You set a content-type property for the request.  This only
    makes sense if you're POSTing your request rather than GETing.
    And even if you *were* POSTing, the correct content type
    would be application/x-www-form-urlencoded.  In practice, this
    probably isn't what's causing any problem, but it's definitely
    wrong (and could, for example, cause problems with some servers
    which accept alternative query formats).

  - You try to read data twice from the InputStream returned by
    the URLConnection.  First, you read into a string, then 
    (having already reached the end of the stream), you pass it
    to a BioJava parsing function.  This would only be valid if you
    did a mark/reset on the stream.

  - At the end of the script, you use the BioJava EMBL parser.
    DAS data is in a special DAS XML format.  If the EMBL parser
    were actually to receive any of this (it won't, see above),
    it would return an error.

Since you're using BioJava, the simplest was to get the data
would be something like:

  import java.net.*;
  import org.biojava.bio.seq.*;
  import org.biojava.bio.seq.db.*;
  import org.biojava.bio.program.das.*;

  // ...

  SequenceDB das = new DASSequenceDB(
      new URL("http://genome.cse.ucsc.edu/cgi-bin/das/hg13/")
  );
  Sequence dasSeq = das.getSequence("chr2");
  System.out.println(dasSeq.subStr(100, 300));


If you want to see how BioJava is *actually* fetching sequence
data, see:

   http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/das/DASRawSymbolList.java?rev=1.6&cvsroot=biojava&content-type=text/vnd.viewcvs-markup



Thomas.