[Biojava-dev] fetching obsolete/superseding files

Spencer Bliven sbliven at ucsd.edu
Wed Apr 27 01:58:03 UTC 2011


Amr,

Try checking idStatus again now. The latest PDB website version just went
into production this afternoon. I currently see
<record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
replacedBy="8CAT 7CAT"/>

I merged in the code you sent me a few days ago for PDBFileReader and for
the caching in PDBStatus. I didn't switch PDBStatus from SAX to DOM because
I had already fixed that bug in another way by the time I got your code
(thanks for pointing it out). I also added methods to AtomCache to match the
setFetch* methods in PDBFileReader. I wrote some tests in TestAtomCache and
it seems to be working great.

Thanks for your contributions!

-Spencer

On Tue, Apr 26, 2011 at 2:55 AM, Amr AL-Hossary
<amr_alhossary at hotmail.com>wrote:

> The bug was fixed per "replaces", but "replacedBy" is not yet fixed.
> Here is current result
>
>
> <idStatus>
> <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB"/>
> <record structureId="2HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="3HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="4HHB" status="CURRENT" replaces="1HHB"/>
> <record structureId="1CAT" status="OBSOLETE" replacedBy="8CAT"/>
> <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
> replacedBy="8CAT"/>
>
> <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
> <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
> <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
> <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
> <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
> </idStatus>
>
> Did you receive my previous mail, Dr. Andreas?
>
> Amr
>
> --------------------------------------------------
> From: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
> Sent: Tuesday, April 26, 2011 5:03 AM
> To: "Spencer Bliven" <sbliven at ucsd.edu>; "Andreas Prlic" <andreas at sdsc.edu
> >
> Cc: <biojava-dev at lists.open-bio.org>
>
> Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>
>  Thanks Spencer,
>> This explains a lot.
>> This way, the current implementation you provided is right and the
>> recursion flag is totally right.
>>
>> No I don't have write access yet, but Dr. Andreas had promised me to grant
>> me the right access after my 2nd participation.
>>
>>  the list of status messages come from looking at the internals of the PDB
>>> website
>>>
>> Do you have access to the Webservice implementation?
>>
>> Amr
>>
>>
>>  From: Spencer Bliven
>>  Sent: Tuesday, April 26, 2011 1:53 AM
>>  To: Andreas Prlic
>>  Cc: Amr AL-Hossary ; biojava-dev at lists.open-bio.org
>>  Subject: Re: [Biojava-dev] fetching obsolete/superseding files
>>
>>
>>  Hey all,
>>
>>  I think we are converging on a consistent model of PDB precedence. This
>> was obscured previously by the bug in how the idStatus page listed only a
>> single 'replacedBy' entry. Andreas has fixed this and it should go live
>> tomorrow. I'll write some unit tests and put update biojava at the same
>> time. Here is how things will work:
>>
>>  PDB supersessions form a directed acyclic graph, where edges point from
>> an obsolete ID to the entry that directly superseded it. Each record
>> contained by idStatus contains a "replaces" attribute, which consists of a
>> space-delimited list of incoming edges, and a "replacedBy" attribute, which
>> consists of a space-delimited list of outgoing edges. Two examples:
>>
>>  <idStatus>
>>  <record structureId="1CAT" status="OBSOLETE" replacedBy="3CAT"/>
>>  <record structureId="3CAT" status="OBSOLETE" replaces="1CAT"
>> replacedBy="8CAT 7CAT"/>
>>  <record structureId="7CAT" status="CURRENT" replaces="3CAT"/>
>>  <record structureId="8CAT" status="CURRENT" replaces="3CAT"/>
>>
>>  <record structureId="1KSA" status="OBSOLETE" replacedBy="3ENI"/>
>>  <record structureId="3ENI" status="CURRENT" replaces="1M50 1KSA"/>
>>  <record structureId="1M50" status="OBSOLETE" replacedBy="3ENI"/>
>>  </idStatus>
>>
>>  The non-recursive versions of getReplaces/getReplacement just get the
>> incoming/outgoing edges for a single node and require only a single REST
>> query. The recursive versions will do a depth-first search up/down the tree
>> and return a list of all nodes reached.
>>
>>  Finally, the getCurrent() method should consistently return a single PDB
>> ID from among the results of recursive-getReplacement. To be consistent with
>> the old REST implementation, this will be the PDB ID that occurs last
>> alphabetically. Thus getCurrent(1HHB) will give 4HHB rather than 2HHB or
>> 3HHB, getCurrent(1CAT) will give 8CAT, and getCurrent(7CAT) will give 7CAT.
>>
>>  Amr, I understand what you were thinking with the getNewestCurrent
>> method. It is appealing to think of 4HHB as the representative for all four
>> structures. However, there is a good reason that 2HHB and 3HHB are still
>> marked as current, and I think it is misleading to include a method that
>> favors 4HHB over other current IDs because it is alphabetically higher. We
>> should probably leave this method out of biojava.
>>
>>
>>  Does anything seems wrong about this model of supersession? In
>> particular, does this address your question about the need for the recursion
>> flag, Amr? My plan is to commit the biojava changes shortly. Amr, do you
>> mind if I merge in your patch with the caching and PDBFileReader updates (Do
>> you have write access to SVN?)? Great code there!
>>
>>  Finally, the list of status messages come from looking at the internals
>> of the PDB website. I haven't come across any examples of them myself to
>> test with. Many seem to be temporary statuses, for publication holds and the
>> like. I'm content to ignore them until someone requests something specific.
>>
>>  -Spencer
>>
>>
>>
>>  On Mon, Apr 25, 2011 at 2:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>>   Hi Amr,
>>
>>
>>   > And any way, the webservice returns only ONE PDB ID max per record
>> (please
>>   > inspect the result returned by this query
>>   > http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB).
>>
>>
>>   I believe that is a bug, I just fixed this and it should become
>>   available with tomorrows web site update (around 00UTC).
>>
>>
>>   > This way, I believe the best way to get the most recent ID is getting
>> the
>>   > isReplacedBy attribute of the record of superseded record (e.g. from
>> 3HHB to
>>   > 1HHB and then from 1HHB to 4HHB).
>>
>>
>>   hope this will be simpler with the updated URL response ...
>>
>>
>>   Andreas
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>>
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>



More information about the biojava-dev mailing list