[Biojava-l] Bad PDB files and batch processing with PDBFileReader

Thu Oct 28 00:47:50 UTC 2010

> I assume AtomCache is a new class in BioJava3?

yes it is... http://biojava.org/wiki/BioJava:CookBook:PDB:read3.0

>
> I must give you my embarrassed apology...after a bunch of testing I
> finally figured out that I had misunderstood where the Parser's error
> handling returns control and started going after the wrong exceptions.
>  It does looks like if setParseCAOnly is true, the reader excepts on
> chains with no CA's instead of just skipping them, though the other
> chains are still parsed into the structure.

This sounds like there might be  a problem with CA only.. do you have
an example ID? also: are you on biojava 1.7 or 3.0 ?

Andreas

>
> -da
>
> On Tue, Oct 26, 2010 at 22:19, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Daniel,
>>
>> PDB files are better nowadays, due to remediation, however there are
>> still issues..
>>
>> it sounds like you just want to figure out how to do the try/catch
>> block properly. You could do something like that:
>>
>>                boolean splitFileOrganisation = true;
>>                AtomCache cache = new
>> AtomCache("/path/to/your/installation/",splitFileOrganisation);
>>
>>                String[] pdbIDs = new String[]{"4hhb", "1cdg","5pti","1gav", "WRONGID" };
>>
>>                for (String pdbID : pdbIDs){
>>
>>                        try {
>>                                Structure s = cache.getStructure(pdbID);
>>                                if ( s == null) {
>>                                        System.out.println("could not find structure " + pdbID);
>>                                        continue;
>>                                }
>>                                // do something with the structure - your inner loop
>>                                System.out.println(s);
>>
>>                        } catch (Exception e){
>>                                // something crazy happened...
>>                                System.err.println("Can't load structure " + pdbID + " reason: " +
>> e.getMessage());
>>                                e.printStackTrace();
>>                        }
>>                }
>>
>>
>>
>>
>> On Tue, Oct 26, 2010 at 9:59 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>>> Glad to hear it, who doesn't like support or clean interfaces?.  No
>>> offense intended, by the way, with respect to PDB errors - obviously
>>> the PDB is an indispensable resource for all protein scientists.
>>>
>>> I am looking at many (fixed-length) pieces of protein chains and doin'
>>> stuff with 'em.  My current code has a pair of nested while loops; the
>>> outer iterates over PDB entries (locally rsync'd copy), parsing them
>>> and the inner iterates over the pieces from each.  When
>>> StructureExceptions come out of my PDBFileReader object I want to
>>> continue the outer loop, moving on to the next set of files without
>>> executing any of the code that depends on correct StructureImpl
>>> objects from the reader (database updates, the inner loop).
>>> Since the reader's methods have their own try-catch blocks, a thrown
>>> StructureException is stopped there and never reaches my own error
>>> handling.  I just need to know when those errors occur so I can skip
>>> those proteins - I am presuming that the correct entries will outweigh
>>> the problem ones by a significant factor and the overall data wont be
>>> seriously impacted.
>>>
>>> -da
>>>
>>> On Tue, Oct 26, 2010 at 21:11, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>> Hi Daniel,
>>>>
>>>> can you explain a bit more what you are doing, in particular what
>>>> errors you would like to deal with on your end?  You should not need
>>>> to worry too much about exception handling. Are there any special
>>>> cases you are interested in?  In this case we should support you with
>>>> a clean interface rather than exception handling from your end...
>>>>
>>>> Andreas
>>>>
>>>>
>>>>
>>>> On Tue, Oct 26, 2010 at 8:54 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>>>>> Hi all,
>>>>> Let me first say thanks to all the BioJava community members for
>>>>> delivering such a useful set of libraries, and that I'm still a newbie
>>>>> when it comes to BioJava (and Java) so forgive me if my question is
>>>>> too trivial.
>>>>>
>>>>> I am doing work on lots (at least thousands) of PDB files from RCSB.
>>>>> As is commonly known, these are often rife with errors which can lead
>>>>> to exceptions during parsing with PDBFileParser.  Because
>>>>> PDBFileParser's methods contain their own try-catch blocks, exception
>>>>> propagation stops there and my code proceeds blindly along regardless
>>>>> of any error checking I do.  I would like to catch the exceptions up
>>>>> in my code where the parser is called, so that I can branch to a
>>>>> continue statement and have my batch processing loops move on to the
>>>>> next file.
>>>>> Should I edit out the try-catch blocks and compile my own version of
>>>>> the library?  Or should I test the returned StructureImpl objects for
>>>>> possession of the fields in question?  In that case, I'm not sure
>>>>> which properties will give the most general success information...and
>>>>> I'd rather not have to check for /every/ property being correct.
>>>>>
>>>>> If there is some great way to check if an exception was caught down a
>>>>> series of nested method calls, please hit me over the head with it.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -da
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>>
>>
>

-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------