[Biojava-l] Bad PDB files and batch processing with PDBFileReader

Wed Oct 27 07:26:22 UTC 2010

I assume AtomCache is a new class in BioJava3?

I must give you my embarrassed apology...after a bunch of testing I
finally figured out that I had misunderstood where the Parser's error
handling returns control and started going after the wrong exceptions.
 It does looks like if setParseCAOnly is true, the reader excepts on
chains with no CA's instead of just skipping them, though the other
chains are still parsed into the structure.

-da

On Tue, Oct 26, 2010 at 22:19, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi Daniel,
>
> PDB files are better nowadays, due to remediation, however there are
> still issues..
>
> it sounds like you just want to figure out how to do the try/catch
> block properly. You could do something like that:
>
>                boolean splitFileOrganisation = true;
>                AtomCache cache = new
> AtomCache("/path/to/your/installation/",splitFileOrganisation);
>
>                String[] pdbIDs = new String[]{"4hhb", "1cdg","5pti","1gav", "WRONGID" };
>
>                for (String pdbID : pdbIDs){
>
>                        try {
>                                Structure s = cache.getStructure(pdbID);
>                                if ( s == null) {
>                                        System.out.println("could not find structure " + pdbID);
>                                        continue;
>                                }
>                                // do something with the structure - your inner loop
>                                System.out.println(s);
>
>                        } catch (Exception e){
>                                // something crazy happened...
>                                System.err.println("Can't load structure " + pdbID + " reason: " +
> e.getMessage());
>                                e.printStackTrace();
>                        }
>                }
>
>
>
>
> On Tue, Oct 26, 2010 at 9:59 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>> Glad to hear it, who doesn't like support or clean interfaces?.  No
>> offense intended, by the way, with respect to PDB errors - obviously
>> the PDB is an indispensable resource for all protein scientists.
>>
>> I am looking at many (fixed-length) pieces of protein chains and doin'
>> stuff with 'em.  My current code has a pair of nested while loops; the
>> outer iterates over PDB entries (locally rsync'd copy), parsing them
>> and the inner iterates over the pieces from each.  When
>> StructureExceptions come out of my PDBFileReader object I want to
>> continue the outer loop, moving on to the next set of files without
>> executing any of the code that depends on correct StructureImpl
>> objects from the reader (database updates, the inner loop).
>> Since the reader's methods have their own try-catch blocks, a thrown
>> StructureException is stopped there and never reaches my own error
>> handling.  I just need to know when those errors occur so I can skip
>> those proteins - I am presuming that the correct entries will outweigh
>> the problem ones by a significant factor and the overall data wont be
>> seriously impacted.
>>
>> -da
>>
>> On Tue, Oct 26, 2010 at 21:11, Andreas Prlic <andreas at sdsc.edu> wrote:
>>> Hi Daniel,
>>>
>>> can you explain a bit more what you are doing, in particular what
>>> errors you would like to deal with on your end?  You should not need
>>> to worry too much about exception handling. Are there any special
>>> cases you are interested in?  In this case we should support you with
>>> a clean interface rather than exception handling from your end...
>>>
>>> Andreas
>>>
>>>
>>>
>>> On Tue, Oct 26, 2010 at 8:54 PM, Daniel Asarnow <dasarnow at gmail.com> wrote:
>>>> Hi all,
>>>> Let me first say thanks to all the BioJava community members for
>>>> delivering such a useful set of libraries, and that I'm still a newbie
>>>> when it comes to BioJava (and Java) so forgive me if my question is
>>>> too trivial.
>>>>
>>>> I am doing work on lots (at least thousands) of PDB files from RCSB.
>>>> As is commonly known, these are often rife with errors which can lead
>>>> to exceptions during parsing with PDBFileParser.  Because
>>>> PDBFileParser's methods contain their own try-catch blocks, exception
>>>> propagation stops there and my code proceeds blindly along regardless
>>>> of any error checking I do.  I would like to catch the exceptions up
>>>> in my code where the parser is called, so that I can branch to a
>>>> continue statement and have my batch processing loops move on to the
>>>> next file.
>>>> Should I edit out the try-catch blocks and compile my own version of
>>>> the library?  Or should I test the returned StructureImpl objects for
>>>> possession of the fields in question?  In that case, I'm not sure
>>>> which properties will give the most general success information...and
>>>> I'd rather not have to check for /every/ property being correct.
>>>>
>>>> If there is some great way to check if an exception was caught down a
>>>> series of nested method calls, please hit me over the head with it.
>>>>
>>>> Thanks!
>>>>
>>>> -da
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>