[Biopython-dev] 2/28 active questions tagged biopython - Stack Overflow
Feed My Inbox
updates at feedmyinbox.com
Mon Feb 28 09:21:17 UTC 2011
// Convert nested dictionary/xml to flat file for sqlite
// February 27, 2011 at 11:25 AM
I've scoured the net and cannot seem to find an appropriate example so I thought I'd ask...
(Btw, much of this is new to me- not all, just most.)
Problem: trying to convert a bio/python nested dictionary (or xml) of pubmed citation data into a flat (normalized) structure eg, sqlite. Citation data was fetched from pubmed using biopython and was parsed into a dictionary, but can also retrieve as xml if needed.
Not all citations will have all fields/keys and not all fields/keys will have the same number of items (authors, mesh terms, refs, etc...) and understand that this is part of the normalization process.
This is about where my practical understanding ends.
That said, I think the process should go something like this: first remove/normalize all unique fields (those that have 1 per paper eg, title, abstract, date, citation, etc..., but say not affiliation as that would be linked to first author). Papers with no abstract could be filled as null?
Then move on to, say, authors and create a separate table again using PMID as the fk and then do same for the various other fields/keys/items in separate tables eg, mesh headings, EC numbers, ref, etc...
Is there a way to do this that removes (pops?) keys/items from the master dictionary so that I can visually see what's been done/needs to be done (obviously leaving the PMID)?
Again, apologies in advance if I'm asking a blindingly obvious question to the initiated- and I do understand that you can't fit a nested structure into a flat space- just looking for the least boneheaded way of going about this and hopefully one that will allow me to make sure that everything was properly captured.
This email was carefully delivered by FeedMyInbox.com.
PO Box 682532 Franklin, TN 37068
More information about the Biopython-dev