[moby] Re: [MOBY-dev] Question on parser -> Big XML documents

Mark Wilkinson markw at illuminae.com
Wed Sep 7 16:39:59 UTC 2005


SOAP::Lite
$Id: Lite.pm,v 1.11 2003/08/11 05:54:51 paulclinger Exp $

MIME::Tools
$VERSION = "5.417";


M


On Wed, 2005-09-07 at 13:12 +0200, Pieter Neerincx wrote:
> On 6-Sep-2005, at 8:05 PM, Mark Wilkinson wrote:
> 
> > This is indeed still an issue, and you are right about the pain of  
> > using
> > SOAP::Lite + MIME::Tools in Perl (though I know that is soon going  
> > to be
> > better, since we are now using an apparently stable combination of
> > these, plus SOAP attachemnts, for our own LSID resolver!  However  
> > these
> > are not available on CPAN yet AFAIK).
> 
> Ok, so there is a combination that really works :). Could you please  
> tell me which version of SOAP::Lite and MIME::Tools you are mixing to  
> make SOAP with attachments work?
> 
> 
> >
> > I recall that Lincoln S. wrote to Paul K. several years ago asking  
> > if it
> > would ever be possible to swap-out the DOM parser in SOAP::Lite for a
> > SAX parser in order to overcome this limitation (and also with an  
> > eye to
> > streaming responses...), but I don't think this even made it on to the
> > SOAP::Lite radar so I doubt that the solution is going to come from  
> > that
> > community anytime soon.
> 
> I doubt that as well. If I find some solution to streaming the SOAP  
> XML I'll post it to the list...
> 
> Thanks,
> 
> Pieter
> 
> >
> > So... I can't advise anything, but perhaps others in the MOBY  
> > community
> > can!
> >
> > M
> >
> >
> > On Tue, 2005-09-06 at 18:30 +0200, Pieter Neerincx wrote:
> >
> >> Hi,
> >>
> >> I have some services that query databases. The result can be nothing,
> >> a single object, or it can be several thousand objects.... I was also
> >> running into trouble with big XML documents. I'm using the Perl API,
> >> which uses SOAP::Lite, which uses XML::LibXML. SOAP::Lite gets the
> >> job done for small xml structures, but for big ones it's a mess.
> >> Firstly, SOAP::Lite loads the entire message in memory as one big
> >> piece (hence no chunks or streams etc.). Secondly, if you use
> >> Data::Dumper to have a look at the perl data structures that are
> >> built, you will see that the same info is copied two, three or more
> >> times. There's quite a bit of redundancy in there. As a result the
> >> expansion factor for parsing xml by SOAP::lite is between 10 and 13
> >> (according to people on the SOAP::Lite mailing list). That means a 10
> >> MB xml document will become 100-130 MB in memory. Several clients
> >> accessing several of these services at the same time will simply
> >> bring our servers on their knees :(. If there are people on the
> >> mailinglist with experience in handling laaaaaarge inputs and/or
> >> outputs I'd really appreciate it if you drop a few lines...
> >>
> >> So far I have looked at working with attachments. Not really an
> >> option with Perl. Combining SOAP::Lite with MIME::Tools is a buggy
> >> combo. xsltproc sounds good. I currently changed my services to send
> >> only a pointer (URL) as result which the client has to fetch. For a
> >> quick and dirty workaround it works beautifully, but from a design
> >> point of view it bad bad bad :( ...
> >>
> >> Cheers,
> >>
> >> Pi
> >>
> >>
> >> On 31-Aug-2005, at 8:46 AM, Sebastien Carrere wrote:
> >>
> >>
> >>> The MOBY message that I wanted to parse was a 12 Megabyte one.
> >>> The web-service concerned is:
> >>>
> >>> name: ImgaGetTigrXMLEntriesFromKeyword
> >>> uri: bioinfo.genopole-toulouse.prd.fr
> >>> input: String
> >>> Output(s): /Collection of /text-xml, as TIGRXML and /Collection of /
> >>> IMGA_Accession, as IMGA_Accession
> >>>
> >>> I think this is a little bit extreme, but it works fine now.
> >>>
> >>> Sebastien
> >>>
> >>> Chunyan Wang wrote:
> >>>
> >>>
> >>>
> >>>> Hi,
> >>>> I changed TimeOut from default to 50000 in the Apache config to
> >>>> fix timeout problem.
> >>>> How big was your XML file when you had problem?
> >>>> Cheers,
> >>>>
> >>>> Joyce
> >>>>
> >>>> Sebastien Carrere wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I got the same problem when I wanted to parse huge XML files.
> >>>>> That's why I have written a clone of CommonSub.pm using
> >>>>> "xsltproc" to parse MOBY message.
> >>>>> Then the parsing time problem was removed.
> >>>>>
> >>>>> However, how do you fixed timeout problem ?
> >>>>>
> >>>>> Sebastien
> >>>>>
> >>>>> Chunyan Wang wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Martin Senger wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>> Could anybody explain this "problem" to me? Thanks.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>   What language are you using, what XML library in that  
> >>>>>>> language?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> I am using Perl and XML::DOM. I am using
> >>>>>> "genericServiceInputParser($data)" to parse the input sequence
> >>>>>> in my service.
> >>>>>> By the way, I want to let you know I fixed timeout problem.
> >>>>>> Thanks for your suggestion.
> >>>>>>
> >>>>>> Joyce
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>   Martin
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> MOBY-dev mailing list
> >>>>>> MOBY-dev at biomoby.org
> >>>>>> http://www.biomoby.org/mailman/listinfo/moby-dev
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> MOBY-dev mailing list
> >>>> MOBY-dev at biomoby.org
> >>>> http://www.biomoby.org/mailman/listinfo/moby-dev
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> -- 
> >>> __________________________________________________________
> >>>
> >>> Sebastien CARRERE                        LIPM (INRA-CNRS)
> >>>                      B.P.52627 -- 31326 CASTANET TOLOSAN
> >>> tel:(33) 5-61-28-53-29
> >>> fax:(33) 5-61-28-50-61
> >>>
> >>>
> >>> _______________________________________________
> >>> MOBY-dev mailing list
> >>> MOBY-dev at biomoby.org
> >>> http://www.biomoby.org/mailman/listinfo/moby-dev
> >>>
> >>>
> >>
> >>
> >> Wageningen University and Research centre (WUR)
> >> Laboratory of Bioinformatics
> >> Transitorium (building 312) room 1034
> >> Dreijenlaan 3
> >> 6703 HA Wageningen
> >> The Netherlands
> >> phone: 0317-483 060
> >> fax: 0317-483 584
> >> mobile: 06-143 66 783
> >> pieter.neerincx at wur.nl
> >>
> >>
> >>
> >> _______________________________________________
> >> MOBY-dev mailing list
> >> MOBY-dev at biomoby.org
> >> http://www.biomoby.org/mailman/listinfo/moby-dev
> >>
> > -- 
> > "Ontologists do it with the edges!"
> >
> > Mark Wilkinson
> > Asst. Professor
> > Dept. of Medical Genetics
> > University of British Columbia
> > PI in Bioinformatics
> > iCAPTURE Centre
> > St. Paul's Hospital
> > Rm. 166, 1081 Burrard St.
> > Vancouver, BC, V6Z 1Y6
> > tel: 604 682 2344 x62129
> > fax: 604 806 9274
> >
> > _______________________________________________
> > MOBY-dev mailing list
> > MOBY-dev at biomoby.org
> > http://www.biomoby.org/mailman/listinfo/moby-dev
> >
> 
> 
> Wageningen University and Research centre (WUR)
> Laboratory of Bioinformatics
> Transitorium (building 312) room 1034
> Dreijenlaan 3
> 6703 HA Wageningen
> The Netherlands
> phone: 0317-483 060
> fax: 0317-483 584
> mobile: 06-143 66 783
> pieter.neerincx at wur.nl
> 
> 
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at biomoby.org
> http://www.biomoby.org/mailman/listinfo/moby-dev
-- 
"Ontologists do it with the edges!"

Mark Wilkinson
Asst. Professor
Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics
iCAPTURE Centre
St. Paul's Hospital
Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274




More information about the MOBY-dev mailing list