[Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource
Christopher Walentas
cwalentas at gmail.com
Thu Sep 16 04:36:13 UTC 2010
Apologies in advance- all of this is very new to me- and I hope that
this is the proper forum for this query.
What I would like to do is parse the returns of an entrez pubmed search
into their smallest, unique useful bits and create a relational database
(sqlite, dee?). Ideally this would not only be of returned fields, but
also drilling further down into say affiliation, addresses, etc...
I believe that I've mastered the search and download functions and
individual citations exist as a stacked dictionary of the xml outputs.
Where I am falling down is understanding how to extract the structure of
these outputs and create a persistent relational resource that's been
normalized such that these fields can be mapped to used to "correct"
values in an uncurated dataset with highly analogous fields.
I've been struggling to bridge the gap between python and sqlite/dee,
however have recently been informed that it might be possible to do
everything within python itself and again apologies for any navieties-
they are indeed sincere, however I'm well aware that a little knowledge
can be dangerous- hence reaching out.
From what I've already read, it would seem that all of this is ideally
suited to bio-/python and am looking forward to learning- I'm just
looking for that swift shove in the right direction and to benefit from
your collective informed guidance.
Cheers in advance,
christopher
More information about the Biopython
mailing list