[Biojava-dev] fetching obsolete/superseding files

Amr AL-Hossary amr_alhossary at hotmail.com
Tue Apr 26 03:03:14 UTC 2011


Thanks Spencer,
This explains a lot.
This way, the current implementation you provided is right and the recursion flag is totally right.

No I don't have write access yet, but Dr. Andreas had promised me to grant me the right access after my 2nd participation.

>the list of status messages come from looking at the internals of the PDB website
Do you have access to the Webservice implementation?

Amr


  From: Spencer Bliven 
  Sent: Tuesday, April 26, 2011 1:53 AM
  To: Andreas Prlic 
  Cc: Amr AL-Hossary ; biojava-dev at lists.open-bio.org 
  Subject: Re: [Biojava-dev] fetching obsolete/superseding files


  Hey all,

  I think we are converging on a consistent model of PDB precedence. This was obscured previously by the bug in how the idStatus page listed only a single 'replacedBy' entry. Andreas has fixed this and it should go live tomorrow. I'll write some unit tests and put update biojava at the same time. Here is how things will work:

  PDB supersessions form a directed acyclic graph, where edges point from an obsolete ID to the entry that directly superseded it. Each record contained by idStatus contains a "replaces" attribute, which consists of a space-delimited list of incoming edges, and a "replacedBy" attribute, which consists of a space-delimited list of outgoing edges. Two examples:

  <idStatus>
  <record structureId="1CAT" status="OBSOLETE" replacedBy="3CAT"/>
  <record structureId="3CAT" status="OBSOLETE" replaces="1CAT" replacedBy="8CAT 7CAT"/>
  <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
  <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>

  <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
  <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
  <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
  </idStatus>

  The non-recursive versions of getReplaces/getReplacement just get the incoming/outgoing edges for a single node and require only a single REST query. The recursive versions will do a depth-first search up/down the tree and return a list of all nodes reached.

  Finally, the getCurrent() method should consistently return a single PDB ID from among the results of recursive-getReplacement. To be consistent with the old REST implementation, this will be the PDB ID that occurs last alphabetically. Thus getCurrent(1HHB) will give 4HHB rather than 2HHB or 3HHB, getCurrent(1CAT) will give 8CAT, and getCurrent(7CAT) will give 7CAT.

  Amr, I understand what you were thinking with the getNewestCurrent method. It is appealing to think of 4HHB as the representative for all four structures. However, there is a good reason that 2HHB and 3HHB are still marked as current, and I think it is misleading to include a method that favors 4HHB over other current IDs because it is alphabetically higher. We should probably leave this method out of biojava.


  Does anything seems wrong about this model of supersession? In particular, does this address your question about the need for the recursion flag, Amr? My plan is to commit the biojava changes shortly. Amr, do you mind if I merge in your patch with the caching and PDBFileReader updates (Do you have write access to SVN?)? Great code there!

  Finally, the list of status messages come from looking at the internals of the PDB website. I haven't come across any examples of them myself to test with. Many seem to be temporary statuses, for publication holds and the like. I'm content to ignore them until someone requests something specific.

  -Spencer



  On Mon, Apr 25, 2011 at 2:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:

    Hi Amr,


    > And any way, the webservice returns only ONE PDB ID max per record (please
    > inspect the result returned by this query
    > http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB ).


    I believe that is a bug, I just fixed this and it should become
    available with tomorrows web site update (around 00UTC).


    > This way, I believe the best way to get the most recent ID is getting the
    > isReplacedBy attribute of the record of superseded record (e.g. from 3HHB to
    > 1HHB and then from 1HHB to 4HHB).


    hope this will be simpler with the updated URL response ...


    Andreas






More information about the biojava-dev mailing list