[MOBY-dev] XML parsing takes forever

Kenny Billiau kenny.billiau at psb.ugent.be
Tue Nov 4 13:03:47 UTC 2008


Hi,

I've run a little testprogram to see where most of the time is spent. I use a 
stripped down version of the xml file I've sent you. All but a few 
ElementContainers and no sequence. Just to keep the processing time low enough 
to debug.

      use MOSES::MOBY::Parser;

       open(FILE, '<', $ARGV[0]);
       my @lines = <FILE>;
       close FILE;

       my $parser = MOSES::MOBY::Parser->new();
       print $parser->parse( method => 'string', data => join(q{}, @lines) );


[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[davebowman at hal2000 ~]$ dprofpp
Exporter::Heavy::heavy_export has 14 unstacked calls in outer
Config::Simple::import_from has 1 unstacked calls in outer
utf8::AUTOLOAD has -1 unstacked calls in outer
Config::Simple:: verbose has 1 unstacked calls in outer
Config::Simple:: vars has 1 unstacked calls in outer
AutoLoader::AUTOLOAD has -3 unstacked calls in outer
Fcntl:: AUTOLOAD has -6 unstacked calls in outer
Fcntl: :__ANON__ has 6 unstacked calls in outer
Exporter::export has -14 unstacked calls in outer
utf8::SWASHNEW has 1 unstacked calls in outer
Total Elapsed Time = 8.470220 Seconds
   User+System Time = 7.750220 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
   68.1   5.280  5.280  69736   0.0001 0.0001  IO::Scalar::print
   6.94   0.538  0.783  64088   0.0000 0.0000  Dumpvalue::stringify
   6.04   0.468  7.173  33755   0.0000 0.0002  Dumpvalue::unwrap
   4.30   0.333  0.333  47091   0.0000 0.0000  File::Spec::Unix::canonpath
   2.43   0.188  7.157  33006   0.0000 0.0002  Dumpvalue::DumpElem
   2.13   0.165  0.165  59776   0.0000 0.0000  overload::mycan
   1.78   0.138  0.224  15990   0.0000 0.0000  MOSES::MOBY::Base::__ANON__
   1.42   0.110  0.708    539   0.0002 0.0013
   MOSES::MOBY::Cache::Central::_createDataTypeFromXML
   1.39   0.108  0.108 142023   0.0000 0.0000  UNIVERSAL::isa
   1.29   0.100  0.100    208   0.0005 0.0005
   MOSES::MOBY::Cache::Registries::init
   1.20   0.093  0.357  17359   0.0000 0.0000  overload::OverloadedStringify
   1.16   0.090  0.090  30407   0.0000 0.0000  XML::LibXML::Node::nodeName
   1.02   0.079  0.079    539   0.0001 0.0001
   MOSES::MOBY::Cache::Central::_clean
   0.99   0.077  0.418    207   0.0004 0.0020
   MOSES::MOBY::Generators::Utils::find_file
   0.94   0.073  0.203   3147   0.0000 0.0001
   XML::LibXML::Element::getChildrenByTagName

It's fairly obivous now ;)

If I remove the overload operator in MOSES::MOBY::Base, time is reduced 15fold! 
But for the 250KB file I've sent, it still takes us a minute to parse.

If I simply parse the file with XML::LibXML and no callbacks or custom 
handlers, it's parsed instantly:

[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[kebil at psbdev01 ~]$ dprofpp
Total Elapsed Time = 0.079034 Seconds
    User+System Time = 0.079034 Seconds


-Kenny, slowly getting to the bottom of this ;)


On Thu, 30 Oct 2008, Edward Kawas wrote:

>  Now I can confirm that it is slow parsing and not an infinite loop! The XML
>  file took over an hour on my virtual machine to parse. I will see what I can
>  do with other SAX parsers for Perl.
>
>  Thanks,
>
>  Eddie
>
>  -----Original Message-----
>  From: moby-dev-bounces at lists.open-bio.org
>  [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of José María
>  Fernández González
>  Sent: October-30-08 12:55 PM
>  To: Core developer announcements
>  Subject: Re: [MOBY-dev] XML parsing takes forever
>
>  Hi everybody,
>  	I'm using the XML::LibXML SAX parser for other projects and tasks,
>  and it can
>  be really, really slow. You don't realize that until you are parsing a large
>  file from a pipe, and you see that most of the time the program which is
>  parsing is CPU (and not I/O) bounded.
>
>   Best Regards,
>    José María
>
>  Edward Kawas wrote:
> >  I think that you are right about the parser not going into an infinite
>  loop.
> >  For some reason, the SAX parser seems to be really slow (although, I don't
> >  want to say that until I know for sure).
> > 
> >  Eddie
> > 
> >  -----Original Message-----
> >  From: moby-dev-bounces at lists.open-bio.org
> >  [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
> >  Sent: October-30-08 11:56 AM
> >  To: Core developer announcements
> >  Subject: Re: [MOBY-dev] XML parsing takes forever
> > 
> >  Hi,
> > 
> >  thx for helping to investigate the problem!
> > 
> >  I'm pretty sure the parser gets out of it's mess. When you truncate the
> >  XML file by removing all but a few ElementContainers, it takes awhile, but
> 
> >  it actually returns without error.
> > 
> >  And memory consuption doesn't seem to be a problem either, as mostly 'top'
> 
> >  reveals that the invoked webservice script uses a steady amount of memory.
> > 
> >  thx,
> >  Kenny
> > 
> >  On Thu, 30 Oct 2008, Edward Kawas wrote:
> > 
> > >  So this one might take some time... from what I can see, our parser goes
> > >  into an infinite loop (although, I am not certain about that ...).
> > > 
> > >  I will play around with this XML file and get back to you.
> > > 
> > >  Thanks,
> > > 
> > >  Eddie
> > > 
> > >  -----Original Message-----
> > >  From: moby-dev-bounces at lists.open-bio.org
> > >  [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
> > >  Sent: October-30-08 8:03 AM
> > >  To: Core developer announcements
> > >  Subject: Re: [MOBY-dev] XML parsing takes forever
> > > 
> > >  Hi,
> > > 
> > >  here you go,
> > > 
> > >  Kenny
> > > 
> > >  On Thu, 30 Oct 2008, Edward Kawas wrote:
> > > 
> > > >  Hi,
> > > > 
> > > >  This is the right list! Could you send me the XML for your second
> >  service?
> > > >  Thanks,
> > > > 
> > > >  Eddie
>>> _______________________________________________
> > >  MOBY-dev mailing list
> > >  MOBY-dev at lists.open-bio.org
> > >  http://lists.open-bio.org/mailman/listinfo/moby-dev
> > > 
> > 
> 
>

-- 


More information about the MOBY-dev mailing list