[Biojava-l] BioJava and DAS

Thomas Down td2 at sanger.ac.uk
Fri Mar 18 05:57:32 EST 2005


On 18 Mar 2005, at 09:20, Joel Björkman wrote:

> Hello!
>
> I'm new to org.biojava.bio.program.das and I have a couple of
> questions regarding fetching features and sequences from dazzle
> servers...
>
> At the moment I'm only interested in getting the sequence and
> annotations from ensembl's database, which should make the problem
> easier.
>
> It's quite obvious when you're browsing
> http://servlet.sanger.ac.uk:8080/das/ that ensemble got a lot of data.
>
> What I'm concerned about is when I look at the different segment sizes
> <SEGMENT id="21" size="46944323" subparts="yes"/>
> Some of them are really big.
>
> Taking a peek at the demo that follows with biojava-live shows that
> you are supposed to download the entire sequence and then make
> operations on it (eg substring).
>
> DASSequenceDB dasDB = new DASSequenceDB(dbURL);
> DASSequence dasSeq = (DASSequence) dasDB.getSequence(seqName);
> System.out.println("1st 10 bases: " + dasSeq.subStr(1, 10));
>
> Isn't there any way that's more efficient and nicer to the server? Is
> there any way to throw requests like this:
> http://servlet.sanger.ac.uk:8080/das/ensembl_Homo_sapiens_core_28_35a/ 
> dna?segment=21:1,10

Hi,

The "getSequence" call in that program doesn't actually retrieve all  
the sequence data from the DAS server, it just creates an object which  
can make calls to the DAS server as necessary.

Currently, the actual DAS client code which is hidden behind the  
Sequence object you get back fetches the sequence data in chunks from  
the DAS server (50kb chunks, if I remember correctly) -- you certainly  
won't end up pulling down the full sequence of chr21 just to look at a  
few bases.

Another issue with DAS is that although the XML documents -- especially  
the feature tables -- can be pretty huge, they compress down well.   
Given a suitable server implementation (Dazzle can certainly do this,  
not sure how many others can), the BioJava DAS client can use the HTTP  
"Accept-Encoding" mechanism to negotiate GZIP compression of all the  
DAS XML.  Saves a lot of bandwidth.

          Thomas.




More information about the Biojava-l mailing list