[Biopython] Reading large files, Biopython cookbook example
Katrina Lexa
klexa at umich.edu
Sun Jul 14 16:40:32 UTC 2013
Hi Peter,
My PDB file came from Maestro, so that is the ordering it follows after 9999. I tried to modify the parser script so that it accounted for the different format of my PDB file, just by changing line 166 to say something like-
try:
resseq=str(line[22:26].split()[0]) # sequence identifier
except ValueError:
resseq=10000 # sequence identifier
But my Python is not great, and I think I'm missing something with that, because I get the same error.
Thank you for your help,
Katrina
On Jul 14, 2013, at 4:21 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> It seems that the wiki example assumes the residues numbers
> wrap round from at 9999 to restart 0, 1, 2, ... whereas your file
> is going from 9999 to A000, A001, etc which I've not seen before.
>
> Where did your PDB file come from? A public database?
> Another tool?
>
> Peter
> On Sat, Jul 13, 2013 at 7:50 AM, Katrina Lexa <klexa at umich.edu> wrote:
>> Hi everyone,
>>
>> I'm trying to do something that seems like it ought to be super simple,
>> since it is on the Biopython wiki cookbook
>> (http://biopython.org/wiki/Reading_large_PDB_files), but for some reason
>> that script will not work for me.
>>
>> When I try to run it as it is, on a pdb file that has more than 10000
>> residues, I get the "NameError: global name 'Residue' is not defined" at
>> line 77. My assumption was that maybe the script needed to import some other
>> module from Biopython, so I added from Bio.PDB import * to the top of the
>> script, but then it failed with "TypeError: 'str' object is not callable" at
>> line 73 (residue = Residue(res_id, resname, self.segid). I tried to
>> circumvent this by just changing the name of the variable being created,
>> from residue = Residue to foobar = Residue (and then carrying that naming
>> through), but I continued to get the TypeError. Has anyone seen this before
>> and/or can anyone help me out getting this to run.
>>
>> I have a file where all of the residues after 9999 are numbered starting
>> with A000, and that causes the normal Bio.PDB.PDBParser to crash with
>> invalid literal for int() with base 10: 'A000', so if there is an easier
>> work around for that, that would also be a solution.
>>
>> Thank you so much for your help!
>
More information about the Biopython
mailing list