[Biopython] processing XML files in Biopython
David Suárez Pascal
david.suarez at yahoo.com
Mon Jun 6 14:37:43 UTC 2011
Sheila,
I don't think you have to deal with XML files. Indeed I tried your code and
what I detected was that Entrez.read already parsed the data.
What I get when I try your code is a list:
>>> type(record)
<class 'Bio.Entrez.Parser.ListElement'>
which contains a dict with the following keys:
>>> record[0].keys()
[u'GBSeq_moltype',
u'GBSeq_source',
u'GBSeq_sequence',
u'GBSeq_primary-accession',
u'GBSeq_definition',
u'GBSeq_accession-version',
u'GBSeq_topology',
u'GBSeq_length',
u'GBSeq_feature-table',
u'GBSeq_create-date',
u'GBSeq_other-seqids',
u'GBSeq_division',
u'GBSeq_taxonomy',
u'GBSeq_comment',
u'GBSeq_source-db',
u'GBSeq_references',
u'GBSeq_update-date',
u'GBSeq_organism',
u'GBSeq_locus']
If you got the same response, then you can just do:
>>> record[0]['GBSeq_locus']
'NP_997807'
I hope this helps.
David
2011/6/6 Sheila the angel <from.d.putto at gmail.com>
> Hi All,
>
> I am new to BioPython. I have simple question 'How can I process XML files
> in Biopython?'
> For example I have NCBI Reference Sequence ID 'NP_997807.1'
> I want to download the 'xml' file and want to extract certain information
> (e.g. GeneID, amino acid length etc.).
> To download the file I did
>
> from Bio import Entrez
> handle = Entrez.efetch(db="protein", id= "NP_997807.1", retmode="xml")
> record = Entrez.read(handle)
> handle.close()
>
> Now I have no clue how to extract certain information (like GeneID) :(
> plz help
>
> --
> Cheers
>
> Sheila d. Angela
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list