[moby] Re: [MOBY-dev] Question on parser -> Big XML documents

Pieter Neerincx Pieter.Neerincx at wur.nl
Mon Sep 12 14:56:13 UTC 2005


On 7-Sep-2005, at 6:39 PM, Mark Wilkinson wrote:

> SOAP::Lite
> $Id: Lite.pm,v 1.11 2003/08/11 05:54:51 paulclinger Exp $
>
> MIME::Tools
> $VERSION = "5.417";

Did you apply any custom patches? Or are those simply the defaults?

Pi

>
> M
>
>
> On Wed, 2005-09-07 at 13:12 +0200, Pieter Neerincx wrote:
>
>> On 6-Sep-2005, at 8:05 PM, Mark Wilkinson wrote:
>>
>>
>>> This is indeed still an issue, and you are right about the pain of
>>> using
>>> SOAP::Lite + MIME::Tools in Perl (though I know that is soon going
>>> to be
>>> better, since we are now using an apparently stable combination of
>>> these, plus SOAP attachemnts, for our own LSID resolver!  However
>>> these
>>> are not available on CPAN yet AFAIK).
>>>
>>
>> Ok, so there is a combination that really works :). Could you please
>> tell me which version of SOAP::Lite and MIME::Tools you are mixing to
>> make SOAP with attachments work?
>>
>>
>>
>>>
>>> I recall that Lincoln S. wrote to Paul K. several years ago asking
>>> if it
>>> would ever be possible to swap-out the DOM parser in SOAP::Lite  
>>> for a
>>> SAX parser in order to overcome this limitation (and also with an
>>> eye to
>>> streaming responses...), but I don't think this even made it on  
>>> to the
>>> SOAP::Lite radar so I doubt that the solution is going to come from
>>> that
>>> community anytime soon.
>>>
>>
>> I doubt that as well. If I find some solution to streaming the SOAP
>> XML I'll post it to the list...
>>
>> Thanks,
>>
>> Pieter
>>
>>
>>>
>>> So... I can't advise anything, but perhaps others in the MOBY
>>> community
>>> can!
>>>
>>> M
>>>
>>>
>>> On Tue, 2005-09-06 at 18:30 +0200, Pieter Neerincx wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have some services that query databases. The result can be  
>>>> nothing,
>>>> a single object, or it can be several thousand objects.... I was  
>>>> also
>>>> running into trouble with big XML documents. I'm using the Perl  
>>>> API,
>>>> which uses SOAP::Lite, which uses XML::LibXML. SOAP::Lite gets the
>>>> job done for small xml structures, but for big ones it's a mess.
>>>> Firstly, SOAP::Lite loads the entire message in memory as one big
>>>> piece (hence no chunks or streams etc.). Secondly, if you use
>>>> Data::Dumper to have a look at the perl data structures that are
>>>> built, you will see that the same info is copied two, three or more
>>>> times. There's quite a bit of redundancy in there. As a result the
>>>> expansion factor for parsing xml by SOAP::lite is between 10 and 13
>>>> (according to people on the SOAP::Lite mailing list). That means  
>>>> a 10
>>>> MB xml document will become 100-130 MB in memory. Several clients
>>>> accessing several of these services at the same time will simply
>>>> bring our servers on their knees :(. If there are people on the
>>>> mailinglist with experience in handling laaaaaarge inputs and/or
>>>> outputs I'd really appreciate it if you drop a few lines...
>>>>
>>>> So far I have looked at working with attachments. Not really an
>>>> option with Perl. Combining SOAP::Lite with MIME::Tools is a buggy
>>>> combo. xsltproc sounds good. I currently changed my services to  
>>>> send
>>>> only a pointer (URL) as result which the client has to fetch. For a
>>>> quick and dirty workaround it works beautifully, but from a design
>>>> point of view it bad bad bad :( ...
>>>>
>>>> Cheers,
>>>>
>>>> Pi
>>>>
>>>>
>>>> On 31-Aug-2005, at 8:46 AM, Sebastien Carrere wrote:
>>>>
>>>>
>>>>
>>>>> The MOBY message that I wanted to parse was a 12 Megabyte one.
>>>>> The web-service concerned is:
>>>>>
>>>>> name: ImgaGetTigrXMLEntriesFromKeyword
>>>>> uri: bioinfo.genopole-toulouse.prd.fr
>>>>> input: String
>>>>> Output(s): /Collection of /text-xml, as TIGRXML and /Collection  
>>>>> of /
>>>>> IMGA_Accession, as IMGA_Accession
>>>>>
>>>>> I think this is a little bit extreme, but it works fine now.
>>>>>
>>>>> Sebastien
>>>>>
>>>>> Chunyan Wang wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I changed TimeOut from default to 50000 in the Apache config to
>>>>>> fix timeout problem.
>>>>>> How big was your XML file when you had problem?
>>>>>> Cheers,
>>>>>>
>>>>>> Joyce
>>>>>>
>>>>>> Sebastien Carrere wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I got the same problem when I wanted to parse huge XML files.
>>>>>>> That's why I have written a clone of CommonSub.pm using
>>>>>>> "xsltproc" to parse MOBY message.
>>>>>>> Then the parsing time problem was removed.
>>>>>>>
>>>>>>> However, how do you fixed timeout problem ?
>>>>>>>
>>>>>>> Sebastien
>>>>>>>
>>>>>>> Chunyan Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Martin Senger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Could anybody explain this "problem" to me? Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   What language are you using, what XML library in that
>>>>>>>>> language?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I am using Perl and XML::DOM. I am using
>>>>>>>> "genericServiceInputParser($data)" to parse the input sequence
>>>>>>>> in my service.
>>>>>>>> By the way, I want to let you know I fixed timeout problem.
>>>>>>>> Thanks for your suggestion.
>>>>>>>>
>>>>>>>> Joyce
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>   Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> MOBY-dev mailing list
>>>>>>>> MOBY-dev at biomoby.org
>>>>>>>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> MOBY-dev mailing list
>>>>>> MOBY-dev at biomoby.org
>>>>>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> -- 
>>>>> __________________________________________________________
>>>>>
>>>>> Sebastien CARRERE                        LIPM (INRA-CNRS)
>>>>>                      B.P.52627 -- 31326 CASTANET TOLOSAN
>>>>> tel:(33) 5-61-28-53-29
>>>>> fax:(33) 5-61-28-50-61
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> MOBY-dev mailing list
>>>>> MOBY-dev at biomoby.org
>>>>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> Wageningen University and Research centre (WUR)
>>>> Laboratory of Bioinformatics
>>>> Transitorium (building 312) room 1034
>>>> Dreijenlaan 3
>>>> 6703 HA Wageningen
>>>> The Netherlands
>>>> phone: 0317-483 060
>>>> fax: 0317-483 584
>>>> mobile: 06-143 66 783
>>>> pieter.neerincx at wur.nl
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> MOBY-dev mailing list
>>>> MOBY-dev at biomoby.org
>>>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>>>
>>>>
>>> -- 
>>> "Ontologists do it with the edges!"
>>>
>>> Mark Wilkinson
>>> Asst. Professor
>>> Dept. of Medical Genetics
>>> University of British Columbia
>>> PI in Bioinformatics
>>> iCAPTURE Centre
>>> St. Paul's Hospital
>>> Rm. 166, 1081 Burrard St.
>>> Vancouver, BC, V6Z 1Y6
>>> tel: 604 682 2344 x62129
>>> fax: 604 806 9274
>>>
>>> _______________________________________________
>>> MOBY-dev mailing list
>>> MOBY-dev at biomoby.org
>>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>>
>>>
>>
>>
>> Wageningen University and Research centre (WUR)
>> Laboratory of Bioinformatics
>> Transitorium (building 312) room 1034
>> Dreijenlaan 3
>> 6703 HA Wageningen
>> The Netherlands
>> phone: 0317-483 060
>> fax: 0317-483 584
>> mobile: 06-143 66 783
>> pieter.neerincx at wur.nl
>>
>>
>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at biomoby.org
>> http://www.biomoby.org/mailman/listinfo/moby-dev
>>
> -- 
> "Ontologists do it with the edges!"
>
> Mark Wilkinson
> Asst. Professor
> Dept. of Medical Genetics
> University of British Columbia
> PI in Bioinformatics
> iCAPTURE Centre
> St. Paul's Hospital
> Rm. 166, 1081 Burrard St.
> Vancouver, BC, V6Z 1Y6
> tel: 604 682 2344 x62129
> fax: 604 806 9274
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at biomoby.org
> http://www.biomoby.org/mailman/listinfo/moby-dev
>


Wageningen University and Research centre (WUR)
Laboratory of Bioinformatics
Transitorium (building 312) room 1034
Dreijenlaan 3
6703 HA Wageningen
The Netherlands
phone: 0317-483 060
fax: 0317-483 584
mobile: 06-143 66 783
pieter.neerincx at wur.nl






More information about the MOBY-dev mailing list