[MOBY-dev] RFC - Synchronization of Biomoby secondary repositories

Andreas Groscurth groscurt at mpiz-koeln.mpg.de
Wed Nov 29 12:02:05 UTC 2006


The following text describes the procedure of the synchronization of  
Biomoby secondary repositories.

Aim: Replicate BioMoby central
-to create mirrors
-to have redundancy in case of failure
-to create private sets of services, either filtered from the global  
set (less services) or added to the global set (more services)

Problems:
-synchronizing repositories
-cascading service/object registration requests
-populating a Moby central from scratch

Solutions:
-The existing RSS feed is used to notify secondaries of changes  
(register service/delete service/update service) to the master
-A complete RSS document is created by a new dump method for  
initialization of Moby centrals from scratch
-Registrations are handled by the client and NOT cascaded

1. Synchronizing repositories
=============================

We propose that secondaries check the Biomoby RSS feed to be
notified whether changes in the registration have been done.  
Currently the RSS feed is updated once a day, for more rapid  
synchronization this would have to be changed.
The changes include registration, modification or deletion of a  
service/object. If changes were applied to the Biomoby Central
registry the changes are adopted to the secondary. 
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.

i) Problems/changes required:

The main question here is if unregistered services are deleted  
completely from the central database or are marked as inactive. The
problem about that is, that the feed would need to contain also the  
information of a deleted service, so that the secondaries will  
retrieve that information. So Moby central will have to keep a full  
transaction log also of deletions.

2. Filtering
============

We propose that any secondary can apply filters to the RSS feed and  
thus only include a subset of all services/objects. This can be  
useful to make finding services from lists easier, to tune workflows  
to performant services, only use local services or to exclude test  
services. Information relevant to filtering is in the RSS, like  
authority, description, but maybe more will be relevant, then  
filtering may need to happen at the level of service RDF.

3. Private services
===================

We propose that any client can register services with a Moby central  
secondary, these will then be available only to clients querying the  
secondary. If the secondary is in a local network, this allows easy  
access control to local services. Any secondary synchronizing to that  
repository will of course inherit all those additional services,  
allowing simple creation of local production Moby centrals and local  
test Moby centrals.

4. Registration
===============

We propose to NOT cascade registration requests, i.e. pass them on  
from secondary to master. That means that the client has control over  
where a registration is done but also means the client has to make  
that choice. Registration clients must thus add an implementation  
that allows a user to choose the Moby central where a service/object  
should be registered. Registration always happens at the topmost Moby  
central node where the service should be visible, all secondaries of  
this Moby central will pick that service up by synchronization.

Why? Cascading registration is cumbersome, as only once a  
registration request has reached the topmost node can name  
duplications etc. be resolved, which must then be passed to the client.

Name conflicts can still occur with locally registered services.  
E.g., Adam registers a private service AnalyseThis on a private  
secondary. Later, Beth registers AnalyseThis with same authority on  
the Moby central master. The private secondary picks this up from the  
RSS and runs into a name duplication. Proposed solution: Local  
registrations MUST ALWAYS use a local authority. E.g., Adam registers  
AnalyseThis with authority InternalIP, and Beth registers AnalyseThis  
with authority paul_vitti.com. Then, we assume whoever registers a  
service at a more global Moby central knows what we're doing and give  
synchronization precedence over local registrations. E.g., a test  
registry is a secondary of Moby central. Chris registers AnalyseThat  
with authority paul_vitti.com in the test registry. Once he's happy  
with testing, he registers AnalyseThat with authority paul_vitti.com  
in Moby central. The test registry retrieves this from the RSS,  
discards the local registration and overwrites it with the  
registration picked up through the RSS.

5. Moby central failure
=======================

If a master Moby central fails, the secondaries continue normal  
operation with no effect on service discovery for all clients keyed  
to a secondary. However, registration is no longer possible at the  
master node. Once the master node comes back up, all secondaries must  
resync.

6. Adaptations to the RSS
=========================

For this procedure the current RSS feed has to be changed marginally, to
enable on the one hand the correct notification of the secondaries,  
on the other hand to ensure that the normal RSS reader still work the
usual way. The current RSS feed mainly uses the Dublin Core Metadata 
to provide the information, so to add additional information to the 
feed it is only needed to add more Dublin Core Metadata.

Primarily the feed has to contain the information whether the service  
is new, modified or deleted. Additionally the service rdf has to be 
linked in the feed to enable the local RDF agent to apply the changes
with the information of the service rdf to the local secondary. 
If other additional information shall be added to the feed to provide 
more possibilities to filter the services can be discussed.

7. Resync
=========

Another main aspect is the problem if a repository is out of sync  
(e.g. due to a temporary failure of master or secondary). The RSS  
feed has a limited length, which means a limited number of  
transactions are contained. Possibly, this will mean it does not  
contain all transactions since the last sync of a secondary.


i) Solution
We propose that each repository will store a time stamp of  
the last synchronization. In case that
in the next synchronization process the oldest changes in the feed  
are older than the current sync time stamp of the repository, 
we run the risk to not receive all information
about service changes. In this case the secondary should be able to  
ask the primary to create a RSS feed with all changes which have 
happened since the current time stamp of the secondary.

8. Initial load
===============

When populating a new secondary from scratch, all registered services/ 
objects need to be received from the master Moby central. We propose  
a new method in Moby central to request all registered services/ 
objects as RSS. Then, the initialization proceeds exactly like a  
synchronization.



So to kick off the discussion here are some of our questions:

1.Is it reasonable to use the existing RSS feed for this procedure ?  
It sounds very handy and avoids creating a similar complete new structure

2.Does any structure keep track of deleted services ?

3.Resync: Is it reasonable to timestamp all transactions in Moby  
central? Or should we solve the resync issue by enforcing a full drop/ 
emptying of the secondary and reload all data as in initial load?


Thanks
Heiko & Andreas

-- 
Andreas Groscurth
Diplom Bioinformatik - PhD Student
Max Planck Institute for Plant Breeding Research
Carl-von-Linné-Weg 10
50829 Cologne
Germany
E-mail:    groscurt at mpiz-koeln.mpg.de
Phone:    +49(0)221-5062-447




More information about the MOBY-dev mailing list