[MOBY-l] My MOBY overview and questions

Tue Nov 6 14:30:07 UTC 2001

I'll respond to the different parts of this from *my own* perspective.  Others may disagree... but
that would be good as this list is still pretty quiet...

Simon Twigger wrote:

> Moby will have a finite set of Classes, these will represent important
> biological 'things' (sequences, IDs, blast results, citations, etc) and
> the structure of these classes will be defined in the data_type.xsd file
> held at MOBY.(eg. http://www.biomoby.org/MOBY/data_type.xsd)

correct.

> Services (not to be confused with Servers which are the things that
> serve up services) will be created and will take Class1 as an input and
> convert it to (return) Class2 as the output. The exact details of the
> input and output of a particular service will be contained in the
> service.wsdl held on the service machine. (eg.
> http://bioinfo.pbi.nrc.ca/mwilkinson/MOBY/service.wsdl )

For the most part correct.  It isn't entirely clear to me yet whether it is always feasible to
require an object as input.  I think it is, given Damian's demonstration of ISYS at the MOBY-DIC
meeting (ISYS allows you to select from free-text and 'cast' it as a certain object type).  We're
really talking about a client-side problem, so I don't think it is unreasonable to require this.

> Q. Does a class have to be listed at Moby to be useable or will the
> service just not be fully 'moby-compliant' if it uses non-moby classes?

Well...  if it uses non-moby classes, then it 'breaks the rules' as I understand them.  If you want
to roll your own class, and for whatever reason not include this class in the data_type.xsd document,
you will then have to write your own data_type.xsd document, as well as your own port_defs.wsdl
document pointing to those data_types, and your service.wsdl document will then point to your local
port_def's.  So long as you have registered your service.wsdl with MOBY-Central it will be discovered
by MOBY Clients.  Clever MOBY clients would have to be able to parse through to your data_types.xsd
and discover the structure of the input/output objects on their own.  I would have to re-think how I
had envisioned the MOBY-Central registration system as working, as I had imagined that a service
would register itself by picking the input/output classes from a pre-defined list, and then register
the URL of the service.wsdl file.... if you are rolling your own classes, then this mechanism wont
work...  It would need to be more flexible.

Note, also, that this breaks your (good!!) idea below... (see ***)

> Q. Can we create new classes by mixing other classes together, how
> complex can a class be, is this limited only by the imagination of the
> service provider?

The complexity of the class is limited only by how clever you think the Clients are that have to
interpret it, and display it... but Damian made a very good point at the meeting from his experience
with ISYS, and I think we agreed to adopt this mentality:  that objects should be **as simple as
possible** - the power of MOBY is not in the passing of complex objects, but in the passing of small
xrefs to pieces of data elsewhere on the net, and then registering *your* ability to deal with
certain types of xrefs.  I would encourage you to avoid creating overly complex objects...

> Services will also register/publish their services at Moby Central,
> indicating their input and output classes.

correct.

> Clients can then query
> MOBY-central for a service that will take the client's input class and
> return the desired output class.

correct.

> Service information will be stored in
> the port_defs.wsdl file at MOBY-Central. (eg.
> http://www.biomoby.org/MOBY/port_defs.wsdl)

I hadn't thought so until you mentioned it... but that may turn out to be the most efficient way in
the end.  The port_defs.wsdl file was intended to be simply another layer of abstraction between the
object definitions and the service definitions.  The service.wsdl file could, in fact, contain the
port_defs and data_types within its own text, but then everyone would have to write the *entire*
document each time they set up a service.  This makes no sense.  Ripping the document apart into its
constituent parts - the data (data_types.xsd), the service types(port_defs.wsdl), and the service
location(service.wsdl) just makes it easier for everyone.  But now that you mention it, I can see
that this is an obvious conclusion to come to... that "service types" is exactly what MOBY-Central is
supposed to be registering, so having MOBY-Central manipulate the port_defs.wsdl file itself is an
obvious next step!

(***) given the discussion above, if you want to roll your own classes and not include them in the
data_types.xsd document, then you will not be able to register your service with MOBY central.  i.e.
there is no way to get your exclusive data types into the service definition unless you write your
own port_defs file, but if the common port_def's file is what MOBY-Central uses as its main registry
of services, you simply can not register yourself...  not a good thing!

> Q. Do we register a service or a server - could we register a server and
> have Moby-Central query the server on a regular basis to get a list of
> services that it currently provides?

You register the service/server.  We did discuss automated updating briefly, but I think the idea was
abandoned as being unnecessary and too complex v.v. the problem we are really trying to tackle at the
beginning which is data-host interoperability.  Perhaps in the future MOBY-Central will become
'smarter', but that isn't part of the *immediate* plan.

> Q. How will the registering of a service at Moby-Central work - is this
> a manual step the developer will have to do or can it be done
> automagically?

Depends who writes the MOBY-Central code ;-)  volunteers?

> Q. What happens in the case that there are lots of services for a given
> input/output? Will Moby-Central return all services, order the services,
> favoring geographically-sensible services, or services that aren't
> currently busy, or services that have the largest capacity, all of the
> above?

man oh man!  You really *are* volunteering to write the code!  8-)

but seriously, first things first.  The functionality of MOBY-Central can grow enormously over time,
but at first there are more important issues to tackle.

> Q. Will MOBY-central detect when a service is offline and not return it
> to the client or will that be up to the client to detect - it would be
> nice if Moby-Central took care of these things.

I agree... but the overhead might be quite high if we do that.

> Q. Can I use the abbreviation MC for Moby-central, I keep typing it
> differently each time?

I don't think anyone really cares, as long as it is clear...

In our lab we have a policy of the "Beerable offense" - that is, it is a beerable offense if you use
an acronym without first defining what you are talking about :-)

> My understanding of the Current state of the Moby:
> - We have sample clients, servers and services, sample service.wsdl,
> port_defs.wsdl and data_type.xsd files

even that is being generous!!

> - We do not have - Moby-Central, a wsdl parser or writer

correct.

> Q. What other parts of the system need to be in place before the entire
> system can get off the ground as a prototype?

As a prototype, not much else is needed.  I think one of the hardest things to write will be a good
client program.  The first ones will likely be CGI-based, as that takes care of a lot of the display
problems and is platform-independant.

I think the most critical things to create right up-front are:

(1) MOBY-Central

(2) A client that can parse wsdl files and create/interpret MOBY objects based solely on the
strucutre defined in those files

(3) a set of 'mission critical' biological objects.  Among these I would include
        (a) GO_Annotation,
        (b) GO_Term,
        (c...) the main DNA/Protein database accession numbers,
        (d...) various sequence object types, and
        (e) a DAS-coordinate object type to allow us to move from MOBY queries into DAS queries.

This pretty much keys us in to all types of data we will want to give a good prototype demo of the
power of MOBY...  GO allows us to skip from organism to organism within a related function/structure,
DB accession numbers allow us to find related information about a given sequence within that
organism, and DAS coordinate objects allow us to extract sequence based on coordinates rather than
sequence object name (eg. get 2000nt upstream of Genbank accession AT22564.1 would be a useful query
which could be more easily achieved if we had DAS coordinate objects.)

> Q. Can we list these parts on biomoby.org so that interested parties can
> start having a crack at them?

I'll get on to that today.

> Hopefully this hasn't made things less clear than they were before!

I'm excited to see the ideas starting to flow!

I have to get my day started.  Later this morning I'll do some work on the BioMOBY CVS to ensure that
the files are available.  I'll also link out to our local Wiki so that we have a 'whiteboard' for
mocking-up whatever ideas we have.

Cheers all!

M

--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK
Canada