[Biojava-dev] fetching obsolete/superseding files

Tue Mar 22 03:24:00 UTC 2011

Amr-

Thanks for volunteering to fix this! I ran across the same problem a while
ago, and ended up manually downloading obsolete records whenever my script
broke. Clearly you have the right solution.

I would concider 2HHB 3HHB and 4HHB to all be valid IDs since they are all
'current'. 1HHB is obsolete because it is a poor interpretation of the data,
not because it is redundant with the other three.

-Spencer

On Mon, Feb 28, 2011 at 9:59 AM, Amr AL-Hossary
<amr_alhossary at hotmail.com>wrote:

> Hi Dr. Adnreas,
>
> I was using a PDB files set, mentioned in an old paper, published in 1994.
> the paper is called
> Enlarged representative set of protein structures
> by
> UWE HOBOHM AND CHRIS SANDER
> European Molecular Biology Laboratory, 69012 Heidelberg, Germany
> (RECEIVEDS eptember 16, 1993; ACCEPTEDD ecember 23, 1993)
> published in
> Protein Science (1994), 3522-524. Cambridge University Press. Printed in
> the USA.
>
> It describes a representative standard set of protein structures that
> doesn't have any redundancy.
> This set was cited by a paper that talks about Cation-pi interactions as
> their representative set; and I was revisiting the same set to use it as my
> positive control in my research.
>
> Your idea (the webservice) is perfect.
> I can write it this weekend. till then, let's list all additional features
> that should be there too.
> I am thinking in
> static String[] udateIDs(String[] IdsToUpdate)
>
> Generally, I agree with you in not letting the parser be aware of versions,
> but I believe it should be at least aware of revisions of the file up to the
> point the local copy was created, and let the user be notified that this
> data is up to the date this file was created and could be outdated; in
> addition to mentioning it explicitly in the documentation.
>
> Well,
> Another point to think about:
> How to fight redundancy among several files?
> If we considered 1HHB, 2HHB, 3HHB, and 4HHB to be representing the same
> structure;
> If we initiate this request
> http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB
> This is the response we get
> <?xml version='1.0' standalone='no' ?>
> <idStatus>
>  <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB" />
>  <record structureId="2HHB" status="CURRENT" replaces="1HHB" />
>  <record structureId="3HHB" status="CURRENT" replaces="1HHB" />
>  <record structureId="4HHB" status="CURRENT" replaces="1HHB" />
> </idStatus>
>
> How to counteract the redundancy in 2HHB, 3HHB, as long as 4HHB is already
> there !
> This could be the next question. :-)
>
> Sincerely,
> Amr
>
> --------------------------------------------------
> From: "Andreas Prlic" <andreas at sdsc.edu>
> Sent: Monday, February 28, 2011 8:15 AM
> To: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
> Cc: <biojava-dev at lists.open-bio.org>
> Subject: Re: fetching obsolete/superseding files
>
>  Hi Amr,
>>
>>  During my research, I met some difficulty in automatically fetching some
>>> old
>>> obsolete files.
>>>
>>
>> ok. May I ask, how did you come across them?
>>
>>
>>  And that inspired me an idea
>>> I am thinking in adding 2 new features to the Biojava "structure" module:
>>>
>>
>> Interesting idea. In terms of software design I would not rely on the
>> parser for this. The local file that is parsed might be already out of
>> date as well. I would try to keep the parser agnostic of particular
>> versions or IDs. Instead I would provide a utility class that can give
>> information on the status of a file. There is a little XML service at
>> http://www.rcsb.org/pdb/software/rest.do#releaseStatus that provides
>> the latest status information. That one could be used to fetch the
>> information and then download any newer (or obsoleted) files...
>>
>> What do you think?
>>
>> Andreas
>>
>>  Supposing that there are 2 new boolean parameters of the PDB file
>>> reader/Parser which are
>>> <fetchOboslete> and <fetchSuperseding>
>>> The first one enables the reader to download a file from the "Obsolete"
>>> archive if it wasn't found in the main repository;
>>> while the later searches the header of a file (not necessarily the same
>>> one)for its newest revision or a superseding new file, fetches it,and
>>> switch to that new file automatically.
>>>
>>> Adding these parameters will need
>>> 1) Manipulate the URL a little, to enableconnecting
>>> toftp://ftp.wwpdb.org/pub/pdb/data/structures/obsoleteparsing
>>>
>>> 2) Parsing the OBSLTE,REVDAT, SPRSDE records; as well as REMARK 4, and
>>> REMARK 5
>>>
>>> If these features are approved, I can do them.
>>>
>>> Any ideas or comments?
>>>
>>>
>>>
>>> Amr
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>>
>>  _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>