[Biojava-dev] fetching obsolete/superseding files

Mon Feb 28 17:59:10 UTC 2011

Hi Dr. Adnreas,

I was using a PDB files set, mentioned in an old paper, published in 1994.
the paper is called
Enlarged representative set of protein structures
by
UWE HOBOHM AND CHRIS SANDER
European Molecular Biology Laboratory, 69012 Heidelberg, Germany
(RECEIVEDS eptember 16, 1993; ACCEPTEDD ecember 23, 1993)
published in
Protein Science (1994), 3522-524. Cambridge University Press. Printed in the 
USA.

It describes a representative standard set of protein structures that 
doesn't have any redundancy.
This set was cited by a paper that talks about Cation-pi interactions as 
their representative set; and I was revisiting the same set to use it as my 
positive control in my research.

Your idea (the webservice) is perfect.
I can write it this weekend. till then, let's list all additional features 
that should be there too.
I am thinking in
static String[] udateIDs(String[] IdsToUpdate)

Generally, I agree with you in not letting the parser be aware of versions, 
but I believe it should be at least aware of revisions of the file up to the 
point the local copy was created, and let the user be notified that this 
data is up to the date this file was created and could be outdated; in 
addition to mentioning it explicitly in the documentation.

Well,
Another point to think about:
How to fight redundancy among several files?
If we considered 1HHB, 2HHB, 3HHB, and 4HHB to be representing the same 
structure;
If we initiate this request 
http://www.rcsb.org/pdb/rest/idStatus?structureId=1HHB,2HHB,3HHB,4HHB
This is the response we get
<?xml version='1.0' standalone='no' ?>
<idStatus>
  <record structureId="1HHB" status="OBSOLETE" replacedBy="4HHB" />
  <record structureId="2HHB" status="CURRENT" replaces="1HHB" />
  <record structureId="3HHB" status="CURRENT" replaces="1HHB" />
  <record structureId="4HHB" status="CURRENT" replaces="1HHB" />
</idStatus>

How to counteract the redundancy in 2HHB, 3HHB, as long as 4HHB is already 
there !
This could be the next question. :-)

Sincerely,
Amr

--------------------------------------------------
From: "Andreas Prlic" <andreas at sdsc.edu>
Sent: Monday, February 28, 2011 8:15 AM
To: "Amr AL-Hossary" <amr_alhossary at hotmail.com>
Cc: <biojava-dev at lists.open-bio.org>
Subject: Re: fetching obsolete/superseding files

> Hi Amr,
>
>> During my research, I met some difficulty in automatically fetching some 
>> old
>> obsolete files.
>
> ok. May I ask, how did you come across them?
>
>
>> And that inspired me an idea
>> I am thinking in adding 2 new features to the Biojava "structure" module:
>
> Interesting idea. In terms of software design I would not rely on the
> parser for this. The local file that is parsed might be already out of
> date as well. I would try to keep the parser agnostic of particular
> versions or IDs. Instead I would provide a utility class that can give
> information on the status of a file. There is a little XML service at
> http://www.rcsb.org/pdb/software/rest.do#releaseStatus that provides
> the latest status information. That one could be used to fetch the
> information and then download any newer (or obsoleted) files...
>
> What do you think?
>
> Andreas
>
>> Supposing that there are 2 new boolean parameters of the PDB file
>> reader/Parser which are
>> <fetchOboslete> and <fetchSuperseding>
>> The first one enables the reader to download a file from the "Obsolete"
>> archive if it wasn't found in the main repository;
>> while the later searches the header of a file (not necessarily the same
>> one)for its newest revision or a superseding new file, fetches it,and
>> switch to that new file automatically.
>>
>> Adding these parameters will need
>> 1) Manipulate the URL a little, to enableconnecting
>> toftp://ftp.wwpdb.org/pub/pdb/data/structures/obsoleteparsing
>> 2) Parsing the OBSLTE,REVDAT, SPRSDE records; as well as REMARK 4, and
>> REMARK 5
>>
>> If these features are approved, I can do them.
>>
>> Any ideas or comments?
>>
>>
>>
>> Amr
>
>
>
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
>