[MOBY-dev] XML parsing takes forever
Kenny Billiau
kebil at psb.ugent.be
Tue Nov 4 13:02:49 UTC 2008
Hi,
I've run a little testprogram to see where most of the time is spent. I use a stripped down version of the xml file I've sent you. All but a few ElementContainers and no sequence. Just to keep the processing time low enough to debug.
use MOSES::MOBY::Parser;
open(FILE, '<', $ARGV[0]);
my @lines = <FILE>;
close FILE;
my $parser = MOSES::MOBY::Parser->new();
print $parser->parse( method => 'string', data => join(q{}, @lines) );
[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[davebowman at hal2000 ~]$ dprofpp
Exporter::Heavy::heavy_export has 14 unstacked calls in outer
Config::Simple::import_from has 1 unstacked calls in outer
utf8::AUTOLOAD has -1 unstacked calls in outer
Config::Simple::verbose has 1 unstacked calls in outer
Config::Simple::vars has 1 unstacked calls in outer
AutoLoader::AUTOLOAD has -3 unstacked calls in outer
Fcntl::AUTOLOAD has -6 unstacked calls in outer
Fcntl::__ANON__ has 6 unstacked calls in outer
Exporter::export has -14 unstacked calls in outer
utf8::SWASHNEW has 1 unstacked calls in outer
Total Elapsed Time = 8.470220 Seconds
User+System Time = 7.750220 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
68.1 5.280 5.280 69736 0.0001 0.0001 IO::Scalar::print
6.94 0.538 0.783 64088 0.0000 0.0000 Dumpvalue::stringify
6.04 0.468 7.173 33755 0.0000 0.0002 Dumpvalue::unwrap
4.30 0.333 0.333 47091 0.0000 0.0000 File::Spec::Unix::canonpath
2.43 0.188 7.157 33006 0.0000 0.0002 Dumpvalue::DumpElem
2.13 0.165 0.165 59776 0.0000 0.0000 overload::mycan
1.78 0.138 0.224 15990 0.0000 0.0000 MOSES::MOBY::Base::__ANON__
1.42 0.110 0.708 539 0.0002 0.0013 MOSES::MOBY::Cache::Central::_createDataTypeFromXML
1.39 0.108 0.108 142023 0.0000 0.0000 UNIVERSAL::isa
1.29 0.100 0.100 208 0.0005 0.0005 MOSES::MOBY::Cache::Registries::init
1.20 0.093 0.357 17359 0.0000 0.0000 overload::OverloadedStringify
1.16 0.090 0.090 30407 0.0000 0.0000 XML::LibXML::Node::nodeName
1.02 0.079 0.079 539 0.0001 0.0001 MOSES::MOBY::Cache::Central::_clean
0.99 0.077 0.418 207 0.0004 0.0020 MOSES::MOBY::Generators::Utils::find_file
0.94 0.073 0.203 3147 0.0000 0.0001 XML::LibXML::Element::getChildrenByTagName
It's fairly obivous now ;)
If I remove the overload operator in MOSES::MOBY::Base, time is reduced
15fold! But for the 250KB file I've sent, it still takes us a minute to
parse.
If I simply parse the file with XML::LibXML and no callbacks or custom
handlers, it's parsed instantly:
[davebowman at hal2000 ~]$ /usr/bin/perl -d:DProf mobytest.pl mobysav_small.xml
[kebil at psbdev01 ~]$ dprofpp
Total Elapsed Time = 0.079034 Seconds
User+System Time = 0.079034 Seconds
-Kenny, slowly getting to the bottom of this ;)
On Thu, 30 Oct 2008, Edward Kawas wrote:
> Now I can confirm that it is slow parsing and not an infinite loop! The XML
> file took over an hour on my virtual machine to parse. I will see what I can
> do with other SAX parsers for Perl.
>
> Thanks,
>
> Eddie
>
> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of José María
> Fernández González
> Sent: October-30-08 12:55 PM
> To: Core developer announcements
> Subject: Re: [MOBY-dev] XML parsing takes forever
>
> Hi everybody,
> I'm using the XML::LibXML SAX parser for other projects and tasks,
> and it can
> be really, really slow. You don't realize that until you are parsing a large
> file from a pipe, and you see that most of the time the program which is
> parsing is CPU (and not I/O) bounded.
>
> Best Regards,
> José María
>
> Edward Kawas wrote:
>> I think that you are right about the parser not going into an infinite
> loop.
>> For some reason, the SAX parser seems to be really slow (although, I don't
>> want to say that until I know for sure).
>>
>> Eddie
>>
>> -----Original Message-----
>> From: moby-dev-bounces at lists.open-bio.org
>> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
>> Sent: October-30-08 11:56 AM
>> To: Core developer announcements
>> Subject: Re: [MOBY-dev] XML parsing takes forever
>>
>> Hi,
>>
>> thx for helping to investigate the problem!
>>
>> I'm pretty sure the parser gets out of it's mess. When you truncate the
>> XML file by removing all but a few ElementContainers, it takes awhile, but
>
>> it actually returns without error.
>>
>> And memory consuption doesn't seem to be a problem either, as mostly 'top'
>
>> reveals that the invoked webservice script uses a steady amount of memory.
>>
>> thx,
>> Kenny
>>
>> On Thu, 30 Oct 2008, Edward Kawas wrote:
>>
>>> So this one might take some time... from what I can see, our parser goes
>>> into an infinite loop (although, I am not certain about that ...).
>>>
>>> I will play around with this XML file and get back to you.
>>>
>>> Thanks,
>>>
>>> Eddie
>>>
>>> -----Original Message-----
>>> From: moby-dev-bounces at lists.open-bio.org
>>> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of Kenny Billiau
>>> Sent: October-30-08 8:03 AM
>>> To: Core developer announcements
>>> Subject: Re: [MOBY-dev] XML parsing takes forever
>>>
>>> Hi,
>>>
>>> here you go,
>>>
>>> Kenny
>>>
>>> On Thu, 30 Oct 2008, Edward Kawas wrote:
>>>
>>>> Hi,
>>>>
>>>> This is the right list! Could you send me the XML for your second
>> service?
>>>> Thanks,
>>>>
>>>> Eddie
>>> _______________________________________________
>>> MOBY-dev mailing list
>>> MOBY-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>>
>>
>
>
--
More information about the MOBY-dev
mailing list