[Biopython-dev] Bio.CDD, anyone?

Thu Jun 19 13:58:25 UTC 2008

> I wonder if the NCBI make any of this available as XML via Entrez?  I
> had a quick look and couldn't find anything.

Actually I already asked this question to NCBI. Their answer was that a subset of the information shown on the web page is available as XML via Entrez's ESummary and EFetch (and thus available from Biopython). The full CDD records are stored as one large file, which is obtainable from NCBI's ftp site, but currently it is not possible to get individual CDD records except in HTML form through the NCBI website.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: > Bio.CDD is a module with a parser for CDD (NCBI's Conserved Domain Database)
> records. The parser parses HTML pages from CDD's web site. Since the parser
> was written about six years ago, the CDD web site has changed considerably.
> Bio.CDD therefore cannot parse current HTML pages from CDD.

A couple of years ago, I wanted to get the CDD domain name and
description and ended up writing my own very simple and crude parser
to extract just this information.  Doing a proper job would mean
extracting lots and lots of fields, e.g.
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=29475

I wonder if the NCBI make any of this available as XML via Entrez?  I
had a quick look and couldn't find anything.

Peter