[Biopython-dev] KEGG support

Wed Feb 10 22:27:07 UTC 2010

On Wed, Feb 10, 2010 at 6:30 PM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
>
> Hi everyone,
>
> KEGG support in Biopython has been mostly untouched for the past 8 years
> with only a few changes and test additions. There is code in the tree to
> work with the Enzyme and Compound databases but not for others such as
> GENES, ORTHOLOGY, DRUG, ...
>
> Considering the fact that I will need to write some code to work with
> other formats I was planning to contribute and integrate it with the
> SeqIO interface. This will require some additional homework on my part.

Excellent news. Have you looked at the existing KEGG parsers in
Biopython, and do you think the current style is suitable? (I haven't
looked at the code recently myself, but will do).

Regarding the SeqIO interface (for KEGG GENES only?), I would be
happy to advise. Initially I suggest you work on adding a parser much
like the other KEGG parsers, returning gene records. Then we can
add a Bio/SeqIO/KeggGeneIO.py wrapper to turn these into SeqRecord
objects.

> KEGG also has a SOAP based API [1]. It's functionality could be in some
> aspects compared to NCBI eutils. Using the python SOAP library suds [2]
> I had no problem interacting with it.

I have not used SOAP, and have a personal preference for REST style
APIs. However, if that is what KEGG offers, this is worth considering.
I think Brad has some experience with (other) SOAP services in Python.
Note the KEGG documentation suggests using SOAPpy for Python.

Interestingly, KEGG are however looking into providing RDF (and
perhaps one day SPARQL endpoints). I will try and find out what sort
of time scale they have in mind while I am at the BioHackathon 2010
this week - http://hackathon3.dbcls.jp/

For now, I would prioritise the KEGG flat file parsers.

Peter