[MOBY-dev] RFC - Synchronization of Biomoby secondary repositories
Andreas Groscurth
groscurt at mpiz-koeln.mpg.de
Wed Nov 29 12:02:05 UTC 2006
The following text describes the procedure of the synchronization of
Biomoby secondary repositories.
Aim: Replicate BioMoby central
-to create mirrors
-to have redundancy in case of failure
-to create private sets of services, either filtered from the global
set (less services) or added to the global set (more services)
Problems:
-synchronizing repositories
-cascading service/object registration requests
-populating a Moby central from scratch
Solutions:
-The existing RSS feed is used to notify secondaries of changes
(register service/delete service/update service) to the master
-A complete RSS document is created by a new dump method for
initialization of Moby centrals from scratch
-Registrations are handled by the client and NOT cascaded
1. Synchronizing repositories
=============================
We propose that secondaries check the Biomoby RSS feed to be
notified whether changes in the registration have been done.
Currently the RSS feed is updated once a day, for more rapid
synchronization this would have to be changed.
The changes include registration, modification or deletion of a
service/object. If changes were applied to the Biomoby Central
registry the changes are adopted to the secondary.
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.
i) Problems/changes required:
The main question here is if unregistered services are deleted
completely from the central database or are marked as inactive. The
problem about that is, that the feed would need to contain also the
information of a deleted service, so that the secondaries will
retrieve that information. So Moby central will have to keep a full
transaction log also of deletions.
2. Filtering
============
We propose that any secondary can apply filters to the RSS feed and
thus only include a subset of all services/objects. This can be
useful to make finding services from lists easier, to tune workflows
to performant services, only use local services or to exclude test
services. Information relevant to filtering is in the RSS, like
authority, description, but maybe more will be relevant, then
filtering may need to happen at the level of service RDF.
3. Private services
===================
We propose that any client can register services with a Moby central
secondary, these will then be available only to clients querying the
secondary. If the secondary is in a local network, this allows easy
access control to local services. Any secondary synchronizing to that
repository will of course inherit all those additional services,
allowing simple creation of local production Moby centrals and local
test Moby centrals.
4. Registration
===============
We propose to NOT cascade registration requests, i.e. pass them on
from secondary to master. That means that the client has control over
where a registration is done but also means the client has to make
that choice. Registration clients must thus add an implementation
that allows a user to choose the Moby central where a service/object
should be registered. Registration always happens at the topmost Moby
central node where the service should be visible, all secondaries of
this Moby central will pick that service up by synchronization.
Why? Cascading registration is cumbersome, as only once a
registration request has reached the topmost node can name
duplications etc. be resolved, which must then be passed to the client.
Name conflicts can still occur with locally registered services.
E.g., Adam registers a private service AnalyseThis on a private
secondary. Later, Beth registers AnalyseThis with same authority on
the Moby central master. The private secondary picks this up from the
RSS and runs into a name duplication. Proposed solution: Local
registrations MUST ALWAYS use a local authority. E.g., Adam registers
AnalyseThis with authority InternalIP, and Beth registers AnalyseThis
with authority paul_vitti.com. Then, we assume whoever registers a
service at a more global Moby central knows what we're doing and give
synchronization precedence over local registrations. E.g., a test
registry is a secondary of Moby central. Chris registers AnalyseThat
with authority paul_vitti.com in the test registry. Once he's happy
with testing, he registers AnalyseThat with authority paul_vitti.com
in Moby central. The test registry retrieves this from the RSS,
discards the local registration and overwrites it with the
registration picked up through the RSS.
5. Moby central failure
=======================
If a master Moby central fails, the secondaries continue normal
operation with no effect on service discovery for all clients keyed
to a secondary. However, registration is no longer possible at the
master node. Once the master node comes back up, all secondaries must
resync.
6. Adaptations to the RSS
=========================
For this procedure the current RSS feed has to be changed marginally, to
enable on the one hand the correct notification of the secondaries,
on the other hand to ensure that the normal RSS reader still work the
usual way. The current RSS feed mainly uses the Dublin Core Metadata
to provide the information, so to add additional information to the
feed it is only needed to add more Dublin Core Metadata.
Primarily the feed has to contain the information whether the service
is new, modified or deleted. Additionally the service rdf has to be
linked in the feed to enable the local RDF agent to apply the changes
with the information of the service rdf to the local secondary.
If other additional information shall be added to the feed to provide
more possibilities to filter the services can be discussed.
7. Resync
=========
Another main aspect is the problem if a repository is out of sync
(e.g. due to a temporary failure of master or secondary). The RSS
feed has a limited length, which means a limited number of
transactions are contained. Possibly, this will mean it does not
contain all transactions since the last sync of a secondary.
i) Solution
We propose that each repository will store a time stamp of
the last synchronization. In case that
in the next synchronization process the oldest changes in the feed
are older than the current sync time stamp of the repository,
we run the risk to not receive all information
about service changes. In this case the secondary should be able to
ask the primary to create a RSS feed with all changes which have
happened since the current time stamp of the secondary.
8. Initial load
===============
When populating a new secondary from scratch, all registered services/
objects need to be received from the master Moby central. We propose
a new method in Moby central to request all registered services/
objects as RSS. Then, the initialization proceeds exactly like a
synchronization.
So to kick off the discussion here are some of our questions:
1.Is it reasonable to use the existing RSS feed for this procedure ?
It sounds very handy and avoids creating a similar complete new structure
2.Does any structure keep track of deleted services ?
3.Resync: Is it reasonable to timestamp all transactions in Moby
central? Or should we solve the resync issue by enforcing a full drop/
emptying of the secondary and reload all data as in initial load?
Thanks
Heiko & Andreas
--
Andreas Groscurth
Diplom Bioinformatik - PhD Student
Max Planck Institute for Plant Breeding Research
Carl-von-Linné-Weg 10
50829 Cologne
Germany
E-mail: groscurt at mpiz-koeln.mpg.de
Phone: +49(0)221-5062-447
More information about the MOBY-dev
mailing list