Parsing XML. Was: [Bioperl-l] Human Invitational Database

Wed Jun 16 09:25:42 EDT 2004

Ewan Birney wrote:
> :) I would really appreciate such a parser as well. (ewan the dinosaur
> grumbles about how annoying XML is to parse... how easy a set of tab
> delimited files are...)

Parsing the XML doesn't seem too hard (one constructor :)  The example 
below fetches a UniProt file and prints some values from it in the 
easiest way I've found so far.  Now building a bioperl object ... I 
leave that as an exercise.

Cheers, Dave
-- 
Dave Howorth
MRC Centre for Protein Engineering
Hills Road, Cambridge, CB2 2QH
01223 252960

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple;
use XML::XPath;

# Fetch the UniProt document
my $accession='Q01292';
my $url = 
"http://www.ebi.uniprot.org/uniprot-srv/downloadSingleUniProtProtein.do?type=xml&entry=$accession";
my $xml_string = get($url);

# Parse it
my $xp = XML::XPath->new(xml => $xml_string);

# Print some values from it
sub printNodeSet($) {
  my $xpath = shift;
  print "\nxpath = $xpath\n";
  my $nodeset = $xp->find($xpath);
  foreach my $node ($nodeset->get_nodelist) {
    print $node->string_value(), "\n";
  }
}

$xp->set_namespace(u => 'http://uniprot.org/uniprot');
printNodeSet('/u:uniprot/u:entry/u:accession/text()');
printNodeSet('/u:uniprot/u:entry/u:accession');
printNodeSet('/u:uniprot/u:entry/u:sequence/@checksum');
printNodeSet('//u:name');