[Biopython-dev] [Biopython] Update: call for Google Summer of Code project ideas

Eric Talevich eric.talevich at gmail.com
Thu Mar 1 18:30:19 UTC 2012


2012/3/1 Peter Cock <p.j.a.cock at googlemail.com>

> 2012/3/1 Eric Talevich <eric.talevich at gmail.com>:
> >
> > Here's one semi-coherent project idea that could fly:
> >
> > Overhaul Biopython's parsing infrastructure for protein
> > primary, secondary and tertiary structures
> >
> > - Refactor PDBParser and parse_pdb_header to allow parsing
> >   amino-acid sequences from SEQRES lines (header) and ATOM
> >   records (body) without building the PDB structure object,
> >   i.e. without using numpy
> > - Write a pure-Python replacement for parsing mmCIF files.
> >   (The module MMCIF2Dict already does almost all the work;
> >   lex+yacc just manages a fairly simple state machine for
> >   recognizing comments, special sub-sections, etc.)
> > - Wrap the parsers for PDB, PDBML and mmCIF under a common
> >   I/O interface under the Bio.Struct namespace
> > - Add parsing support for protein secondary structures,
> >   based on the relevant PDB records or (perhaps) DSSP
> >   output. (Note that João did some work on this already.)
>
> Do you think you could mentor that? One serious downside
> would be even more work on PDB related code which will
> make future merging even harder. We do need to tackle the
> GSoC back log as a priority.
>

I would serve if called upon, but I think it's best if we set this one
aside for E&J SoC (JESoC?) rather than GSoC this year.


>
> > Variants
> > --------
> >
> > So, from the Biopython 1.60 thread:
> >
> > - James Casbon has offered to merge PyVCF into Biopython, right?
> > - BCF, the binary form of VCF (via blocked gzip), may also
> >   be worthwhile to support
> > - GVF, the Genome Variation Format, appears to be intended
> >   to be competitive with VCF. It's probably at least as well
> >   thought-out as VCF, sight unseen. It's based on GFF.
> >
> > Synthesizing the above, we have a GSoC project that looks like:
> >
> > - Help merge PyVCF into Python (w/ James's support -- I
> >   don't mean to volunteer him for this in absentia)?
> > - Write a GVF parser that emits the same object type as
> >   PyVCF, potentially also using existing GFF code
> > - Time permitting, look into blocked gzip support for VCF
> >   (BCF), also looking at SAM/BAM for inspiration and
> >   reusable code.
>
> Sounds interesting - who might be willing to mentor it?
>

Does someone feel comfortable asking James for his thoughts on this?

I'm not especially well qualified to mentor this, though I could assist as
a secondary mentor if needed. Any other Biopython devs/users well
acquainted with VCF/PyVCF?




More information about the Biopython-dev mailing list