[Biopython-dev] Which NCBI / Entrez module?
biopython-dev at maubp.freeserve.co.uk
Mon Aug 13 22:59:42 UTC 2007
I've just been updating the Tutorial to expand the SeqIO documentation
into a full chapter, and one of the things it now covers is parsing a
handle to an online databases.
For the SwissProt example I was guided by the existing tutorial code and
used Bio.WWW.ExPASy.get_sprot_raw() which works fine (but interestingly
only fetches one record).
I then added an example fetching GenBank records from the NCBI, based on
the existing tutorial code which uses Bio.GenBank to do some searches
and retrieve records by their GI number. I decided to use
Bio.GenBank.download_many() with Bio.SeqIO.parse() in the new example -
and this works nicely.
Now, looking over the code, the "online" parts of Bio.GenBank are using
Bio.EUtils, a complex bit of code dated 2003 by Andrew Dalke. There is
another (older and much smaller) module Bio.WWW.NCBI dated 1999-2000 by
Jeffrey Chang, which also offers an EUtils interface. This does make an
appearance in the tutorial in the "Connecting with biological databases"
Bio.WWW.NCBI seems to just build EntreZ URLs, and returns raw data as
provided by the NCBI. Bio.EUtils says it also does this, and offers a
higher level interface supporting history tracking and parsing
of query results (in XML).
Is anyone here very familiar with either of these modules? Should we
depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update
its documentation to recommend using that instead?
More information about the Biopython-dev