[MOBY-dev] XML parsing takes forever
Kenny Billiau
kenny.billiau at psb.ugent.be
Tue Nov 4 13:03:47 UTC 2008
Hi,
I've run a little testprogram to see where most of the time is spent. I use a
stripped down version of the xml file I've sent you. All but a few
ElementContainers and no sequence. Just to keep the processing time low enough
to debug.
use MOSES::MOBY::Parser;
open(FILE, '<', $ARGV[0]);
my @lines = <FILE>;
close FILE;
my $parser = MOSES::MOBY::Parser->new();
print $parser->parse( method => 'string', data => join(q{}, @lines) );
[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[davebowman at hal2000 ~]$ dprofpp
Exporter::Heavy::heavy_export has 14 unstacked calls in outer
Config::Simple::import_from has 1 unstacked calls in outer
utf8::AUTOLOAD has -1 unstacked calls in outer
Config::Simple:: verbose has 1 unstacked calls in outer
Config::Simple:: vars has 1 unstacked calls in outer
AutoLoader::AUTOLOAD has -3 unstacked calls in outer
Fcntl:: AUTOLOAD has -6 unstacked calls in outer
Fcntl: :__ANON__ has 6 unstacked calls in outer
Exporter::export has -14 unstacked calls in outer
utf8::SWASHNEW has 1 unstacked calls in outer
Total Elapsed Time = 8.470220 Seconds
User+System Time = 7.750220 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
68.1 5.280 5.280 69736 0.0001 0.0001 IO::Scalar::print
6.94 0.538 0.783 64088 0.0000 0.0000 Dumpvalue::stringify
6.04 0.468 7.173 33755 0.0000 0.0002 Dumpvalue::unwrap
4.30 0.333 0.333 47091 0.0000 0.0000 File::Spec::Unix::canonpath
2.43 0.188 7.157 33006 0.0000 0.0002 Dumpvalue::DumpElem
2.13 0.165 0.165 59776 0.0000 0.0000 overload::mycan
1.78 0.138 0.224 15990 0.0000 0.0000 MOSES::MOBY::Base::__ANON__
1.42 0.110 0.708 539 0.0002 0.0013
MOSES::MOBY::Cache::Central::_createDataTypeFromXML
1.39 0.108 0.108 142023 0.0000 0.0000 UNIVERSAL::isa
1.29 0.100 0.100 208 0.0005 0.0005
MOSES::MOBY::Cache::Registries::init
1.20 0.093 0.357 17359 0.0000 0.0000 overload::OverloadedStringify
1.16 0.090 0.090 30407 0.0000 0.0000 XML::LibXML::Node::nodeName
1.02 0.079 0.079 539 0.0001 0.0001
MOSES::MOBY::Cache::Central::_clean
0.99 0.077 0.418 207 0.0004 0.0020
MOSES::MOBY::Generators::Utils::find_file
0.94 0.073 0.203 3147 0.0000 0.0001
XML::LibXML::Element::getChildrenByTagName
It's fairly obivous now ;)
If I remove the overload operator in MOSES::MOBY::Base, time is reduced 15fold!
But for the 250KB file I've sent, it still takes us a minute to parse.
If I simply parse the file with XML::LibXML and no callbacks or custom
handlers, it's parsed instantly:
[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[kebil at psbdev01 ~]$ dprofpp
Total Elapsed Time = 0.079034 Seconds
User+System Time = 0.079034 Seconds
-Kenny, slowly getting to the bottom of this ;)
On Thu, 30 Oct 2008, Edward Kawas wrote:
> Now I can confirm that it is slow parsing and not an infinite loop! The XML
> file took over an hour on my virtual machine to parse. I will see what I can
> do with other SAX parsers for Perl.
>
> Thanks,
>
> Eddie
>
> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of José María
> Fernández González
> Sent: October-30-08 12:55 PM
> To: Core developer announcements
> Subject: Re: [MOBY-dev] XML parsing takes forever
>
> Hi everybody,
> I'm using the XML::LibXML SAX parser for other projects and tasks,
> and it can
> be really, really slow. You don't realize that until you are parsing a large
> file from a pipe, and you see that most of the time the program which is
> parsing is CPU (and not I/O) bounded.
>
> Best Regards,
> José María
>
> Edward Kawas wrote:
> > I think that you are right about the parser not going into an infinite
> loop.
> > For some reason, the SAX parser seems to be really slow (although, I don't
> > want to say that until I know for sure).
> >
> > Eddie
> >
> > -----Original Message-----
> > From: moby-dev-bounces at lists.open-bio.org
> > [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
> > Sent: October-30-08 11:56 AM
> > To: Core developer announcements
> > Subject: Re: [MOBY-dev] XML parsing takes forever
> >
> > Hi,
> >
> > thx for helping to investigate the problem!
> >
> > I'm pretty sure the parser gets out of it's mess. When you truncate the
> > XML file by removing all but a few ElementContainers, it takes awhile, but
>
> > it actually returns without error.
> >
> > And memory consuption doesn't seem to be a problem either, as mostly 'top'
>
> > reveals that the invoked webservice script uses a steady amount of memory.
> >
> > thx,
> > Kenny
> >
> > On Thu, 30 Oct 2008, Edward Kawas wrote:
> >
> > > So this one might take some time... from what I can see, our parser goes
> > > into an infinite loop (although, I am not certain about that ...).
> > >
> > > I will play around with this XML file and get back to you.
> > >
> > > Thanks,
> > >
> > > Eddie
> > >
> > > -----Original Message-----
> > > From: moby-dev-bounces at lists.open-bio.org
> > > [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
> > > Sent: October-30-08 8:03 AM
> > > To: Core developer announcements
> > > Subject: Re: [MOBY-dev] XML parsing takes forever
> > >
> > > Hi,
> > >
> > > here you go,
> > >
> > > Kenny
> > >
> > > On Thu, 30 Oct 2008, Edward Kawas wrote:
> > >
> > > > Hi,
> > > >
> > > > This is the right list! Could you send me the XML for your second
> > service?
> > > > Thanks,
> > > >
> > > > Eddie
>>> _______________________________________________
> > > MOBY-dev mailing list
> > > MOBY-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/moby-dev
> > >
> >
>
>
--
More information about the MOBY-dev
mailing list