[DAS] DAS for protein structures

Andreas Prlic ap3 at sanger.ac.uk
Sun Jul 25 13:49:47 EDT 2004


 Hi Andrew !

Thanks for your detailed feedback. Let me go through the most important issues 
of your mail:

> Why do you define your own XML format for 3D structure?  What about
> basing it on, say, CML?  Or why not just feed a PDB file back, perhaps
> embedded inside of XML?  

*  DAS responses consist of XML files that provide a simple format to exchange 
data. PDB files contain different types of data: biological data about the 
protein, literature refs, description of the experiment and finally the 
coordinates. So I would not want to mix DAS-XML and (traditional) PDB files. 
As you mentioned there are several XML formats for the replacement of PDB 
files. It does not make sense to invent yet another one to deal with *all* 
the PDB data.  Here the idea is to reduce the PDB file to the minimal data 
needed for visualization, i.e. coordinates of atoms and their connections. 
The biological data that is projected onto the 3D structure by a client is 
retrieved via DAS - Feature and Alignment services.


> After all, no structure program is going to
> handle your XML format. 

I guess no structure program is capable of doing ANY - DAS communication at 
the moment.  That's what we try to provide - missing services to apply DAS in 
the structure world. If you are developing a Java program  (I know you are a 
Python guy, but still ;-)  , making it DAS enabled  is quite simple. There is 
support for the new  DAS commands in Biojava. e.g.:

To get a Biojava structure object via DAS 
  
String server = "http://das.sanger.ac.uk/das/structure/structure?query=";
DASStructureClient dasc = new DASStructureClient(server);
Structure struc = dasc.getStructure(pdbcode);	    	

> You use the "cigar" string because it provides an "efficient way to
> encode an alignment" but then you don't provide an efficient way to
> encode the rotation matrix.  

Yes, but the matrix does not take much space, so it is not really an issue. An 
alignment in contrast can be quite big, so the cigar encoding saves a lot of 
space.

> Why is CRC64 suggested?  (md5 is better.) 

This is the checksum provided by Swissprot. 

> The entry_points optional attribute "href" 
>> echoes the URL query that was used to fetch  the current document.

>I don't understand the need for this.

same here. It is in the DAS spec. so I kept it. There are a couple of issues 
with entry_points and proteins anyways. E.g. Swissprot has >150.000 "entry 
points" ;-)

Several other of your issues I will address by improving the docu over the 
next days.

Regards,
Andreas

-- 

Andreas Prlic      Wellcome Trust Sanger Institute
                   Hinxton, Cambridge CB10 1SA, UK



More information about the DAS mailing list