[Biopython-dev] Building Gene Ontology support into Biopython

Chris Lasher chris.lasher at gmail.com
Sun Oct 18 05:22:29 UTC 2009

I have a need to work with the gene ontology (GO) and gene ontology
annotations (GOAs) for my research. It seems Biopython still lacks GO
support despite a few threads from several years ago. I'd like to make
GO support in Biopython a reality now. I would really appreciate any
help and suggestions.

Bioperl has solid GO support. I don't find their code straightforward
at all; I haven't picked out what component is responsible for what
task. Nonetheless, it could provide starting points to build support
for Biopython.

Beyond looking through Bioperl code, though, I have several questions
and I really welcome suggestions:

1) First off, does anyone have any gene ontology Python code laying around?

2) What is the Biopython stance on introducing third-party
dependencies? The gene ontology is represented a directed acyclic
graph (DAG) and I want to use an existing graph library rather than
roll our own. What would be the aversion to requiring either NetworkX
or igraph as a dependency for the GO library. (I have experience with
NetworkX and would prefer it, though I imagine igraph would be very
similar for nearly all the methods we'd need access to to construct
the DAG)

3) What are parsers written using these days? I checked the tutorial
section on them
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc209) but
this wasn't explicitly covered. Any pointers to recently written
parsers? I seem to recall Biopython has moved away from Martel
parsers, correct? Has anything been done with pyparsing or some other
parser, or is it strictly manual now? Also, I'm welcoming tips on the
architecture of parsers in general.

4) Tying the GO Annotations to a fundamental Biopython data structure.
This can't really be a SeqRecord object. SeqRecord.annotations makes
sense, however, I can't guarantee a SeqRecord object will exist
because the annotations don't come with the sequence itself. (A
sequence is required to instantiate a SeqRecord object). Any
suggestions on this?

5) BioSQL support. Not having used BioSQL in the past, I'm a bit wary
of adding this feature, but it is implemented in Bioperl. I haven't
yet figured out if it's used as the default data store for their
parsers or if it is only an optional store.

Comments most welcome.


More information about the Biopython-dev mailing list