Parsing XML. Was: [Bioperl-l] Human Invitational Database
Dave Howorth
dhoworth at mrc-lmb.cam.ac.uk
Wed Jun 16 09:25:42 EDT 2004
Ewan Birney wrote:
> :) I would really appreciate such a parser as well. (ewan the dinosaur
> grumbles about how annoying XML is to parse... how easy a set of tab
> delimited files are...)
Parsing the XML doesn't seem too hard (one constructor :) The example
below fetches a UniProt file and prints some values from it in the
easiest way I've found so far. Now building a bioperl object ... I
leave that as an exercise.
Cheers, Dave
--
Dave Howorth
MRC Centre for Protein Engineering
Hills Road, Cambridge, CB2 2QH
01223 252960
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use XML::XPath;
# Fetch the UniProt document
my $accession='Q01292';
my $url =
"http://www.ebi.uniprot.org/uniprot-srv/downloadSingleUniProtProtein.do?type=xml&entry=$accession";
my $xml_string = get($url);
# Parse it
my $xp = XML::XPath->new(xml => $xml_string);
# Print some values from it
sub printNodeSet($) {
my $xpath = shift;
print "\nxpath = $xpath\n";
my $nodeset = $xp->find($xpath);
foreach my $node ($nodeset->get_nodelist) {
print $node->string_value(), "\n";
}
}
$xp->set_namespace(u => 'http://uniprot.org/uniprot');
printNodeSet('/u:uniprot/u:entry/u:accession/text()');
printNodeSet('/u:uniprot/u:entry/u:accession');
printNodeSet('/u:uniprot/u:entry/u:sequence/@checksum');
printNodeSet('//u:name');
More information about the Bioperl-l
mailing list