[Biojava-l] Bad PDB files and batch processing with PDBFileReader
Andreas Prlic
andreas at sdsc.edu
Thu Oct 28 00:47:50 UTC 2010
> I assume AtomCache is a new class in BioJava3?
yes it is... http://biojava.org/wiki/BioJava:CookBook:PDB:read3.0
>
> I must give you my embarrassed apology...after a bunch of testing I
> finally figured out that I had misunderstood where the Parser's error
> handling returns control and started going after the wrong exceptions.
> It does looks like if setParseCAOnly is true, the reader excepts on
> chains with no CA's instead of just skipping them, though the other
> chains are still parsed into the structure.
This sounds like there might be a problem with CA only.. do you have
an example ID? also: are you on biojava 1.7 or 3.0 ?
Andreas
>
> -da
>
> On Tue, Oct 26, 2010 at 22:19, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Daniel,
>>
>> PDB files are better nowadays, due to remediation, however there are
>> still issues..
>>
>> it sounds like you just want to figure out how to do the try/catch
>> block properly. You could do something like that:
>>
>> boolean splitFileOrganisation = true;
>> AtomCache cache = new
>> AtomCache("/path/to/your/installation/",splitFileOrganisation);
>>
>> String[] pdbIDs = new String[]{"4hhb", "1cdg","5pti","1gav", "WRONGID" };
>>
>> for (String pdbID : pdbIDs){
>>
>> try {
>> Structure s = cache.getStructure(pdbID);
>> if ( s == null) {
>> System.out.println("could not find structure " + pdbID);
>> continue;
>> }
>> // do something with the structure - your inner loop
>> System.out.println(s);
>>
>> } catch (Exception e){
>> // something crazy happened...
>> System.err.println("Can't load structure " + pdbID + " reason: " +
>> e.getMessage());
>> e.printStackTrace();
>> }
>> }
>>
>>
>>
>>
>> On Tue, Oct 26, 2010 at 9:59 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>>> Glad to hear it, who doesn't like support or clean interfaces?. No
>>> offense intended, by the way, with respect to PDB errors - obviously
>>> the PDB is an indispensable resource for all protein scientists.
>>>
>>> I am looking at many (fixed-length) pieces of protein chains and doin'
>>> stuff with 'em. My current code has a pair of nested while loops; the
>>> outer iterates over PDB entries (locally rsync'd copy), parsing them
>>> and the inner iterates over the pieces from each. When
>>> StructureExceptions come out of my PDBFileReader object I want to
>>> continue the outer loop, moving on to the next set of files without
>>> executing any of the code that depends on correct StructureImpl
>>> objects from the reader (database updates, the inner loop).
>>> Since the reader's methods have their own try-catch blocks, a thrown
>>> StructureException is stopped there and never reaches my own error
>>> handling. I just need to know when those errors occur so I can skip
>>> those proteins - I am presuming that the correct entries will outweigh
>>> the problem ones by a significant factor and the overall data wont be
>>> seriously impacted.
>>>
>>> -da
>>>
>>> On Tue, Oct 26, 2010 at 21:11, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>> Hi Daniel,
>>>>
>>>> can you explain a bit more what you are doing, in particular what
>>>> errors you would like to deal with on your end? You should not need
>>>> to worry too much about exception handling. Are there any special
>>>> cases you are interested in? In this case we should support you with
>>>> a clean interface rather than exception handling from your end...
>>>>
>>>> Andreas
>>>>
>>>>
>>>>
>>>> On Tue, Oct 26, 2010 at 8:54 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>>>>> Hi all,
>>>>> Let me first say thanks to all the BioJava community members for
>>>>> delivering such a useful set of libraries, and that I'm still a newbie
>>>>> when it comes to BioJava (and Java) so forgive me if my question is
>>>>> too trivial.
>>>>>
>>>>> I am doing work on lots (at least thousands) of PDB files from RCSB.
>>>>> As is commonly known, these are often rife with errors which can lead
>>>>> to exceptions during parsing with PDBFileParser. Because
>>>>> PDBFileParser's methods contain their own try-catch blocks, exception
>>>>> propagation stops there and my code proceeds blindly along regardless
>>>>> of any error checking I do. I would like to catch the exceptions up
>>>>> in my code where the parser is called, so that I can branch to a
>>>>> continue statement and have my batch processing loops move on to the
>>>>> next file.
>>>>> Should I edit out the try-catch blocks and compile my own version of
>>>>> the library? Or should I test the returned StructureImpl objects for
>>>>> possession of the fields in question? In that case, I'm not sure
>>>>> which properties will give the most general success information...and
>>>>> I'd rather not have to check for /every/ property being correct.
>>>>>
>>>>> If there is some great way to check if an exception was caught down a
>>>>> series of nested method calls, please hit me over the head with it.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -da
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>>
>>
>
--
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------
More information about the Biojava-l
mailing list