[Biopython-dev] Biopython 1.60 plans and beyond

Sat Feb 18 22:17:24 UTC 2012

On Sat, Feb 18, 2012 at 2:54 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Feb 18, 2012 at 7:20 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> >
> > 3. SeqIO read-only support for PDB files
> > (https://redmine.open-bio.org/issues/3295). I've been using this code
> on my
> > own. It fails to parse at least one PDB file I care about (3BEG); I
> haven't
> > tried it on a larger set of PDB files. In any case this shouldn't be too
> > hard to fix, and I'd like to see it in a stable Biopython release.
>
>
> If right now it has known failures, I don't want to squeeze this into
> Biopython 1.59 next week.
>

Agreed! But 1.60 sounds like a good goal.

> Does your code manage to produce the same FASTA sequence as
> the PDB themselves offer for download? That would be my expectation
> as an end user. It should be easy enough to test if you've already
> done a full local PDB download.
>

If there are disordered regions (very common), the missing residues are
replaced with 'X' characters. These residues can be listed in the SEQRES
lines of the PDB header, if it's available, but they're not included with
the atomic coordinates, so PdbIO can't reliably fill in these disordered
residues for all PDB files. This matches the behavior of the tool I was
using before (which is non-free and not widely used).

I don't keep a local copy of PDB normally, but I'll download it and do the
test before asking to merge PdbIO.

> I'm still uneasy about this making SeqIO depend on NumPy (even as
> a soft dependency at runtime), given the fact that the rest of SeqIO
> should work fine under Jython and PpPy. Support for the NumPy
> API under PyPy is coming along, but isn't likely for Jython for now
> (although PyPy's efforts may help there).
>
>
As an alternative, I could copy the portion of PDBParser and
StructureBuilder that are needed to read the amino acid sequence, but skip
creating Atoms. That would avoid the need for Numpy, at the cost of some
code duplication. Interested in that approach? If so, I can take a closer
look and report back on the feasibility.

-Eric