[DAS] DAS for protein structures
Andreas Prlic
ap3 at sanger.ac.uk
Sun Jul 25 13:49:47 EDT 2004
Hi Andrew !
Thanks for your detailed feedback. Let me go through the most important issues
of your mail:
> Why do you define your own XML format for 3D structure? What about
> basing it on, say, CML? Or why not just feed a PDB file back, perhaps
> embedded inside of XML?
* DAS responses consist of XML files that provide a simple format to exchange
data. PDB files contain different types of data: biological data about the
protein, literature refs, description of the experiment and finally the
coordinates. So I would not want to mix DAS-XML and (traditional) PDB files.
As you mentioned there are several XML formats for the replacement of PDB
files. It does not make sense to invent yet another one to deal with *all*
the PDB data. Here the idea is to reduce the PDB file to the minimal data
needed for visualization, i.e. coordinates of atoms and their connections.
The biological data that is projected onto the 3D structure by a client is
retrieved via DAS - Feature and Alignment services.
> After all, no structure program is going to
> handle your XML format.
I guess no structure program is capable of doing ANY - DAS communication at
the moment. That's what we try to provide - missing services to apply DAS in
the structure world. If you are developing a Java program (I know you are a
Python guy, but still ;-) , making it DAS enabled is quite simple. There is
support for the new DAS commands in Biojava. e.g.:
To get a Biojava structure object via DAS
String server = "http://das.sanger.ac.uk/das/structure/structure?query=";
DASStructureClient dasc = new DASStructureClient(server);
Structure struc = dasc.getStructure(pdbcode);
> You use the "cigar" string because it provides an "efficient way to
> encode an alignment" but then you don't provide an efficient way to
> encode the rotation matrix.
Yes, but the matrix does not take much space, so it is not really an issue. An
alignment in contrast can be quite big, so the cigar encoding saves a lot of
space.
> Why is CRC64 suggested? (md5 is better.)
This is the checksum provided by Swissprot.
> The entry_points optional attribute "href"
>> echoes the URL query that was used to fetch the current document.
>I don't understand the need for this.
same here. It is in the DAS spec. so I kept it. There are a couple of issues
with entry_points and proteins anyways. E.g. Swissprot has >150.000 "entry
points" ;-)
Several other of your issues I will address by improving the docu over the
next days.
Regards,
Andreas
--
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
More information about the DAS
mailing list