[Biopython-dev] Bio.Entrez XML parsing

Peter biopython at maubp.freeserve.co.uk
Tue Apr 1 13:49:14 UTC 2008


Michiel wrote:
> I have added a read() function to Bio.Entrez in CVS.
>  Following Peter's suggestion, I put a dictionary (_NameToModule) inside the
> Bio.Entrez.DataHandler class, which can be used to override the default parser
> with a user-defined parser.

Do you only intend to support Entrez XML files with this read()
function, or potentially other formats too?

Even for the assorted XML formats, I'm not yet clear on how you
imaging this being extended.  Have you had a chance to look at Eric's
Entrez Taxonomy XML parser?  It would need some re-factoring to fit in
(see attachments on Bug 2475).
http://bugzilla.open-bio.org/show_bug.cgi?id=2475

>  I am not sure though why a user-defined parser needs to go through
>  Bio.Entrez.read(). Wouldn't it be easier to do something like
>  >>> from Bio import Entrez
>  >>> handle = Entrez.efetch(something)
>  >>> record = run_my_parser(handle)

Sure - you could pass the handle to any parser of your choice, e.g.
Bio.SeqIO.read() or Bio.SeqIO.parse() if you used Bio.Entrez.efetch to
get a GenBank or Fasta file.

Peter



More information about the Biopython-dev mailing list