[MOBY-dev] screen-scraping MOBY

Mark Wilkinson markw at illuminae.com
Thu Mar 11 01:02:18 UTC 2004


Hi MOBYers!

In an effort to make your lives easier (and especially those at PlaNet
who are currently displaying the raw XML of the MOBY Objects in their
portal :-P ) I have just made it easier to ask the MOBY CGI client
program at mobycentral to do what you want it to do, and then
screen-scrape the result.

The client program accepts the following CGI GET parameters:

namespace - a namespace
id - an id
servicename - the name of the service you wish to execute
authority - the authority providing that service
object - this is a URL-encoded MOBY object (object, no message struct)

What comes out of it is HTML, but I have made the HTML scrape-easy! 
Each object is surrounded by a pair of HTML comments:

<!-- SCRAPE_ME_START -->
<!-- SCRAPE_ME_END -->

between those comment tags is a <tr>....</td> containing the rendering
of a single MOBY response object into HTML.  Thus you can simply do a
wget or an LWP GET from inside of your own web pages, do a regexp on the
result, and stick it into a <TABLE> on your own web page in whatever
format you like.

Ta Daaaaaa!

Who loves you more than I do ;-)

These small changes have not yet been migrated into the CVS copy of the
browser (@Gbrowse) because they are aiming for a release soon and I
don't want to bugger that up.  If you want a copy of this code, let me
know and I'll mail it to you.

Mark

-- 
Mark Wilkinson (mwilkinson at mrl.ubc.ca)
University of British Columbia iCAPTURE Centre



More information about the MOBY-dev mailing list