[Biopython] Entrez.read return value is typed as a string??
Ben O'Loghlin
bassbabyface at yahoo.com
Thu Oct 29 03:19:09 UTC 2009
Hi Peter,
Many thanks for your post, you cleared up a world of confusion for me.
A few answers/comments:
>> Oh dear - were you working though the Entrez chapter in the Tutorial?
>> If not, what where you looking at?
No, I didn't find the tutorial until you mentioned it. I came across
BioPython by Googling "python pubmed", the most relevant hit on the first
screenful seemed to be the first one, at
http://baoilleach.blogspot.com/2008/02/searching-pubmed-with-python.html.
This brief blog describes access via the Bio.EUtils package which seems to
have disappeared, and it took me about 45 mins to realise that it was no
longer in the distro and to track down Bio.Entrez.
Then Googling BioPython Entrez, the first hit took me to the documentation
(I missed spotting the tutorial link!) and all subsequent attempts were
based on reading this doco and the source code, and scratching my head and
trying random things.
>So you see by default, the NCBI is returning HTML. We can ask for XML:
>
>>>> handle = Entrez.efetch(db="pubmed", id="17206916", retmode="XML")
>>>> print handle.readline()
><?xml version="1.0"?>
This all makes sense now, I wasn't aware of the different 'retmode' options.
The Bio.Entrez.efetch() documentation points me to
http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetch_help.html, which
doesn't mention the 'retmode' or 'rettype' parameters. In fact I couldn't
find any explicit reference to it in the Tutorial either, just the use of
'rettype=text' in one of the example code snippets.
I subsequently tracked down this page
http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchlit_help.html which
does at least indicate the different rettypes and retmodes available.
>You could parse this with Bio.Entrez.read() if you wanted to:
>
>>>> handle = Entrez.efetch(db="pubmed", id="17206916", retmode="XML")
>>>> record = Entrez.read(handle)
>>>> print record
>[{u'MedlineCitation': ... ]
I'm interested in using this format, however I don't understand how to
read/write fields and subtrees of the object type
'Bio.Entrez.Parser.ListElement' returned by Entrez.read(handle) with retmode
XML.
I'm finding it hard to track down references to this [{u'x':['y']}] object
format in Python , possibly due to the fact that I can't get Google to
search for strings like [{u'. I am however appreciative that there appears
to be a u'SpaceFlightMission' tag in Pubmed's default rettype. :)
I'm also a little confused about why handle.read() returns a string in XML
format whereas Entrez.read(handle) returns the
Bio.Entrez.Parser.ListElement. In fact I only knew about this latter method
from your email, since the example in the Bio.Entrez doco only uses the
handle.read() syntax, and doesn't mention that there's any distinction, nor
which might be more appropriate for which task.
> Does that help?
Immensely.
If you (or any other Bio.Wizards) have the time and the inclination to help
me further, I'd be very grateful for any thoughts relevant to my ponderings
above.
Thanks again,
Ben
More information about the Biopython
mailing list