[MOBY-dev] RFC - Synchronization of Biomoby secondary repositories

Edward Kawas edward.kawas at gmail.com
Wed Nov 29 15:22:14 UTC 2006


Hi,

>From reading *just* the 'aim' and 'problems' portion of this message, I was
wondering whether you thought about using the agent for mirroring.

Just throwing it out there,

Eddie

> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org 
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of 
> Andreas Groscurth
> Sent: Wednesday, November 29, 2006 4:02 AM
> To: moby-dev at lists.open-bio.org
> Subject: [MOBY-dev] RFC - Synchronization of Biomoby 
> secondary repositories
> 
> The following text describes the procedure of the 
> synchronization of Biomoby secondary repositories.
> 
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the 
> global set (less services) or added to the global set (more services)
> 
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests -populating a 
> Moby central from scratch
> 
> Solutions:
> -The existing RSS feed is used to notify secondaries of 
> changes (register service/delete service/update service) to 
> the master -A complete RSS document is created by a new dump 
> method for initialization of Moby centrals from scratch 
> -Registrations are handled by the client and NOT cascaded
> 
> 1. Synchronizing repositories
> =============================
> 
> We propose that secondaries check the Biomoby RSS feed to be 
> notified whether changes in the registration have been done.  
> Currently the RSS feed is updated once a day, for more rapid 
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of 
> a service/object. If changes were applied to the Biomoby 
> Central registry the changes are adopted to the secondary. 
> The RSS contains the signature URL where the secondary picks 
> up the service RDF to retrieve all details required for the 
> registration using the existing RDF agent.
> 
> i) Problems/changes required:
> 
> The main question here is if unregistered services are 
> deleted completely from the central database or are marked as 
> inactive. The problem about that is, that the feed would need 
> to contain also the information of a deleted service, so that 
> the secondaries will retrieve that information. So Moby 
> central will have to keep a full transaction log also of deletions.
> 
> 2. Filtering
> ============
> 
> We propose that any secondary can apply filters to the RSS 
> feed and thus only include a subset of all services/objects. 
> This can be useful to make finding services from lists 
> easier, to tune workflows to performant services, only use 
> local services or to exclude test services. Information 
> relevant to filtering is in the RSS, like authority, 
> description, but maybe more will be relevant, then filtering 
> may need to happen at the level of service RDF.
> 
> 3. Private services
> ===================
> 
> We propose that any client can register services with a Moby 
> central secondary, these will then be available only to 
> clients querying the secondary. If the secondary is in a 
> local network, this allows easy access control to local 
> services. Any secondary synchronizing to that repository will 
> of course inherit all those additional services, allowing 
> simple creation of local production Moby centrals and local 
> test Moby centrals.
> 
> 4. Registration
> ===============
> 
> We propose to NOT cascade registration requests, i.e. pass 
> them on from secondary to master. That means that the client 
> has control over where a registration is done but also means 
> the client has to make that choice. Registration clients must 
> thus add an implementation that allows a user to choose the 
> Moby central where a service/object should be registered. 
> Registration always happens at the topmost Moby central node 
> where the service should be visible, all secondaries of this 
> Moby central will pick that service up by synchronization.
> 
> Why? Cascading registration is cumbersome, as only once a 
> registration request has reached the topmost node can name 
> duplications etc. be resolved, which must then be passed to 
> the client.
> 
> Name conflicts can still occur with locally registered services.  
> E.g., Adam registers a private service AnalyseThis on a 
> private secondary. Later, Beth registers AnalyseThis with 
> same authority on the Moby central master. The private 
> secondary picks this up from the RSS and runs into a name 
> duplication. Proposed solution: Local registrations MUST 
> ALWAYS use a local authority. E.g., Adam registers 
> AnalyseThis with authority InternalIP, and Beth registers 
> AnalyseThis with authority paul_vitti.com. Then, we assume 
> whoever registers a service at a more global Moby central 
> knows what we're doing and give synchronization precedence 
> over local registrations. E.g., a test registry is a 
> secondary of Moby central. Chris registers AnalyseThat with 
> authority paul_vitti.com in the test registry. Once he's 
> happy with testing, he registers AnalyseThat with authority 
> paul_vitti.com in Moby central. The test registry retrieves 
> this from the RSS, discards the local registration and 
> overwrites it with the registration picked up through the RSS.
> 
> 5. Moby central failure
> =======================
> 
> If a master Moby central fails, the secondaries continue 
> normal operation with no effect on service discovery for all 
> clients keyed to a secondary. However, registration is no 
> longer possible at the master node. Once the master node 
> comes back up, all secondaries must resync.
> 
> 6. Adaptations to the RSS
> =========================
> 
> For this procedure the current RSS feed has to be changed 
> marginally, to enable on the one hand the correct 
> notification of the secondaries, on the other hand to ensure 
> that the normal RSS reader still work the usual way. The 
> current RSS feed mainly uses the Dublin Core Metadata to 
> provide the information, so to add additional information to 
> the feed it is only needed to add more Dublin Core Metadata.
> 
> Primarily the feed has to contain the information whether the 
> service is new, modified or deleted. Additionally the service 
> rdf has to be linked in the feed to enable the local RDF 
> agent to apply the changes with the information of the 
> service rdf to the local secondary. 
> If other additional information shall be added to the feed to 
> provide more possibilities to filter the services can be discussed.
> 
> 7. Resync
> =========
> 
> Another main aspect is the problem if a repository is out of 
> sync (e.g. due to a temporary failure of master or 
> secondary). The RSS feed has a limited length, which means a 
> limited number of transactions are contained. Possibly, this 
> will mean it does not contain all transactions since the last 
> sync of a secondary.
> 
> 
> i) Solution
> We propose that each repository will store a time stamp of 
> the last synchronization. In case that in the next 
> synchronization process the oldest changes in the feed are 
> older than the current sync time stamp of the repository, we 
> run the risk to not receive all information about service 
> changes. In this case the secondary should be able to ask the 
> primary to create a RSS feed with all changes which have 
> happened since the current time stamp of the secondary.
> 
> 8. Initial load
> ===============
> 
> When populating a new secondary from scratch, all registered 
> services/ objects need to be received from the master Moby 
> central. We propose a new method in Moby central to request 
> all registered services/ objects as RSS. Then, the 
> initialization proceeds exactly like a synchronization.
> 
> 
> 
> So to kick off the discussion here are some of our questions:
> 
> 1.Is it reasonable to use the existing RSS feed for this procedure ?  
> It sounds very handy and avoids creating a similar complete 
> new structure
> 
> 2.Does any structure keep track of deleted services ?
> 
> 3.Resync: Is it reasonable to timestamp all transactions in 
> Moby central? Or should we solve the resync issue by 
> enforcing a full drop/ emptying of the secondary and reload 
> all data as in initial load?
> 
> 
> Thanks
> Heiko & Andreas
> 
> --
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research Carl-von-Linné-Weg 10
> 50829 Cologne
> Germany
> E-mail:    groscurt at mpiz-koeln.mpg.de
> Phone:    +49(0)221-5062-447
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev





More information about the MOBY-dev mailing list