[Biojava-dev] fetching obsolete/superseding files

Spencer Bliven sbliven at ucsd.edu
Fri Apr 22 17:38:31 UTC 2011


Amr-

I made a start on the problem of obsolete records. There's still no way to
download them from biojava, but I added some code to check the status of a
PDB ID and to get the current PDB ID for obsolete versions. Hopefully this
complements whatever code you've been working on. See
org.biojava.bio.structure.PDBStatus in the biojava3-structure module. Let me
know if any of the documentation is unclear.

-Spencer


On Mon, Mar 21, 2011 at 8:24 PM, Spencer Bliven <sbliven at ucsd.edu> wrote:

> Amr-
>
> Thanks for volunteering to fix this! I ran across the same problem a while
> ago, and ended up manually downloading obsolete records whenever my script
> broke. Clearly you have the right solution.
>
> I would concider 2HHB 3HHB and 4HHB to all be valid IDs since they are all
> 'current'. 1HHB is obsolete because it is a poor interpretation of the data,
> not because it is redundant with the other three.
>
> -Spencer
>
>
> On Mon, Feb 28, 2011 at 9:59 AM, Amr AL-Hossary <amr_alhossary at hotmail.com
> > wrote:
>
>> Hi Dr. Adnreas,
>>
>> I was using a PDB files set, mentioned in an old paper, published in 1994.
>> the paper is called
>> Enlarged representative set of protein structures
>> by
>> UWE HOBOHM AND CHRIS SANDER
>> European Molecular Biology Laboratory, 69012 Heidelberg, Germany
>> (RECEIVEDS eptember 16, 1993; ACCEPTEDD ecember 23, 1993)
>> published in
>> Protein Science (1994), 3522-524. Cambridge University Press. Printed in
>> the USA.
>>
>> It describes a representative standard set of protein structures that
>> doesn't have any redundancy.
>> This set was cited by a paper that talks about Cation-pi interactions as
>> their representative set; and I was revisiting the same set to use it as my
>> positive control in my research.
>>
>> Your idea (the webservice) is perfect.
>> I can write it this weekend. till then, let's list all additional features
>> that should be there too.
>> I am thinking in
>> static String[] udateIDs(String[] IdsToUpdate)
>>
>> Generally, I agree with you in not letting the parser be aware of
>> versions, but I believe it should be at least aware of revisions of the file
>> up to the point the local copy was created, and let the user be notified
>> that this data is up to the date this file was created and could be
>> outdated; in addition to mentioning it explicitly in the documentation.
>>
>> Well,
>> Another point to think about:
>> How to fight redundancy among several files?
>> If we considered 1HHB, 2HHB, 3HHB, and 4HHB to be representing the same
>> structure;
>> If we initiate this request
>> http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB
>> This is the response we get
>> <?xml version='1.0' standalone='no' ?>
>> <idStatus>
>>  <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB" />
>>  <record structureId="2HHB" status="CURRENT" replaces="1HHB" />
>>  <record structureId="3HHB" status="CURRENT" replaces="1HHB" />
>>  <record structureId="4HHB" status="CURRENT" replaces="1HHB" />
>> </idStatus>
>>
>> How to counteract the redundancy in 2HHB, 3HHB, as long as 4HHB is already
>> there !
>> This could be the next question. :-)
>>
>> Sincerely,
>> Amr
>>
>> --------------------------------------------------
>> From: "Andreas Prlic" <andreas at sdsc.edu>
>> Sent: Monday, February 28, 2011 8:15 AM
>> To: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
>> Cc: <biojava-dev at lists.open-bio.org>
>> Subject: Re: fetching obsolete/superseding files
>>
>>  Hi Amr,
>>>
>>>  During my research, I met some difficulty in automatically fetching some
>>>> old
>>>> obsolete files.
>>>>
>>>
>>> ok. May I ask, how did you come across them?
>>>
>>>
>>>  And that inspired me an idea
>>>> I am thinking in adding 2 new features to the Biojava "structure"
>>>> module:
>>>>
>>>
>>> Interesting idea. In terms of software design I would not rely on the
>>> parser for this. The local file that is parsed might be already out of
>>> date as well. I would try to keep the parser agnostic of particular
>>> versions or IDs. Instead I would provide a utility class that can give
>>> information on the status of a file. There is a little XML service at
>>> http://www.rcsb.org/pdb/software/rest.do#releaseStatus that provides
>>> the latest status information. That one could be used to fetch the
>>> information and then download any newer (or obsoleted) files...
>>>
>>> What do you think?
>>>
>>> Andreas
>>>
>>>  Supposing that there are 2 new boolean parameters of the PDB file
>>>> reader/Parser which are
>>>> <fetchOboslete> and <fetchSuperseding>
>>>> The first one enables the reader to download a file from the "Obsolete"
>>>> archive if it wasn't found in the main repository;
>>>> while the later searches the header of a file (not necessarily the same
>>>> one)for its newest revision or a superseding new file, fetches it,and
>>>> switch to that new file automatically.
>>>>
>>>> Adding these parameters will need
>>>> 1) Manipulate the URL a little, to enableconnecting
>>>> toftp://ftp.wwpdb.org/pub/pdb/data/structures/obsoleteparsing
>>>>
>>>> 2) Parsing the OBSLTE,REVDAT, SPRSDE records; as well as REMARK 4, and
>>>> REMARK 5
>>>>
>>>> If these features are approved, I can do them.
>>>>
>>>> Any ideas or comments?
>>>>
>>>>
>>>>
>>>> Amr
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>>
>>>  _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>



More information about the biojava-dev mailing list