[Biojava-dev] fetching obsolete/superseding files

Andreas Prlic andreas at sdsc.edu
Wed Apr 27 02:50:54 UTC 2011


Great Job Spencer and Amr,

Amr, I'll set up your SVN access later on. Can you mail me your
desired username (off list)?

Andreas


On Tue, Apr 26, 2011 at 6:58 PM, Spencer Bliven <sbliven at ucsd.edu> wrote:
> Amr,
>
> Try checking idStatus again now. The latest PDB website version just went
> into production this afternoon. I currently see
> <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
> replacedBy="8CAT 7CAT"/>
>
> I merged in the code you sent me a few days ago for PDBFileReader and for
> the caching in PDBStatus. I didn't switch PDBStatus from SAX to DOM because
> I had already fixed that bug in another way by the time I got your code
> (thanks for pointing it out). I also added methods to AtomCache to match the
> setFetch* methods in PDBFileReader. I wrote some tests in TestAtomCache and
> it seems to be working great.
>
> Thanks for your contributions!
>
> -Spencer
>
> On Tue, Apr 26, 2011 at 2:55 AM, Amr AL-Hossary <amr_alhossary at hotmail.com>
> wrote:
>>
>> The bug was fixed per "replaces", but "replacedBy" is not yet fixed.
>> Here is current result
>>
>> <idStatus>
>> <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB"/>
>> <record structureId="2HHB" status="CURRENT" replaces="1HHB"/>
>> <record structureId="3HHB" status="CURRENT" replaces="1HHB"/>
>> <record structureId="4HHB" status="CURRENT" replaces="1HHB"/>
>> <record structureId="1CAT" status="OBSOLETE" replacedBy="8CAT"/>
>> <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
>> replacedBy="8CAT"/>
>> <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
>> <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
>> <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
>> <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
>> <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
>> </idStatus>
>>
>> Did you receive my previous mail, Dr. Andreas?
>>
>> Amr
>>
>> --------------------------------------------------
>> From: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
>> Sent: Tuesday, April 26, 2011 5:03 AM
>> To: "Spencer Bliven" <sbliven at ucsd.edu>; "Andreas Prlic"
>> <andreas at sdsc.edu>
>> Cc: <biojava-dev at lists.open-bio.org>
>> Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>>
>>> Thanks Spencer,
>>> This explains a lot.
>>> This way, the current implementation you provided is right and the
>>> recursion flag is totally right.
>>>
>>> No I don't have write access yet, but Dr. Andreas had promised me to
>>> grant me the right access after my 2nd participation.
>>>
>>>> the list of status messages come from looking at the internals of the
>>>> PDB website
>>>
>>> Do you have access to the Webservice implementation?
>>>
>>> Amr
>>>
>>>
>>>  From: Spencer Bliven
>>>  Sent: Tuesday, April 26, 2011 1:53 AM
>>>  To: Andreas Prlic
>>>  Cc: Amr AL-Hossary ; biojava-dev at lists.open-bio.org
>>>  Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>>>
>>>
>>>  Hey all,
>>>
>>>  I think we are converging on a consistent model of PDB precedence. This
>>> was obscured previously by the bug in how the idStatus page listed only a
>>> single 'replacedBy' entry. Andreas has fixed this and it should go live
>>> tomorrow. I'll write some unit tests and put update biojava at the same
>>> time. Here is how things will work:
>>>
>>>  PDB supersessions form a directed acyclic graph, where edges point from
>>> an obsolete ID to the entry that directly superseded it. Each record
>>> contained by idStatus contains a "replaces" attribute, which consists of a
>>> space-delimited list of incoming edges, and a "replacedBy" attribute, which
>>> consists of a space-delimited list of outgoing edges. Two examples:
>>>
>>>  <idStatus>
>>>  <record structureId="1CAT" status="OBSOLETE" replacedBy="3CAT"/>
>>>  <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
>>> replacedBy="8CAT 7CAT"/>
>>>  <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
>>>  <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
>>>
>>>  <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
>>>  <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
>>>  <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
>>>  </idStatus>
>>>
>>>  The non-recursive versions of getReplaces/getReplacement just get the
>>> incoming/outgoing edges for a single node and require only a single REST
>>> query. The recursive versions will do a depth-first search up/down the tree
>>> and return a list of all nodes reached.
>>>
>>>  Finally, the getCurrent() method should consistently return a single PDB
>>> ID from among the results of recursive-getReplacement. To be consistent with
>>> the old REST implementation, this will be the PDB ID that occurs last
>>> alphabetically. Thus getCurrent(1HHB) will give 4HHB rather than 2HHB or
>>> 3HHB, getCurrent(1CAT) will give 8CAT, and getCurrent(7CAT) will give 7CAT.
>>>
>>>  Amr, I understand what you were thinking with the getNewestCurrent
>>> method. It is appealing to think of 4HHB as the representative for all four
>>> structures. However, there is a good reason that 2HHB and 3HHB are still
>>> marked as current, and I think it is misleading to include a method that
>>> favors 4HHB over other current IDs because it is alphabetically higher. We
>>> should probably leave this method out of biojava.
>>>
>>>
>>>  Does anything seems wrong about this model of supersession? In
>>> particular, does this address your question about the need for the recursion
>>> flag, Amr? My plan is to commit the biojava changes shortly. Amr, do you
>>> mind if I merge in your patch with the caching and PDBFileReader updates (Do
>>> you have write access to SVN?)? Great code there!
>>>
>>>  Finally, the list of status messages come from looking at the internals
>>> of the PDB website. I haven't come across any examples of them myself to
>>> test with. Many seem to be temporary statuses, for publication holds and the
>>> like. I'm content to ignore them until someone requests something specific.
>>>
>>>  -Spencer
>>>
>>>
>>>
>>>  On Mon, Apr 25, 2011 at 2:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>
>>>   Hi Amr,
>>>
>>>
>>>   > And any way, the webservice returns only ONE PDB ID max per record
>>> (please
>>>   > inspect the result returned by this query
>>>   > http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB
>>> ).
>>>
>>>
>>>   I believe that is a bug, I just fixed this and it should become
>>>   available with tomorrows web site update (around 00UTC).
>>>
>>>
>>>   > This way, I believe the best way to get the most recent ID is getting
>>> the
>>>   > isReplacedBy attribute of the record of superseded record (e.g. from
>>> 3HHB to
>>>   > 1HHB and then from 1HHB to 4HHB).
>>>
>>>
>>>   hope this will be simpler with the updated URL response ...
>>>
>>>
>>>   Andreas
>>>
>>>
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>
>




More information about the biojava-dev mailing list