[MOBY-dev] RFC - Synchronization of Biomoby secondary repositories
Edward Kawas
edward.kawas at gmail.com
Wed Nov 29 15:22:14 UTC 2006
Hi,
>From reading *just* the 'aim' and 'problems' portion of this message, I was
wondering whether you thought about using the agent for mirroring.
Just throwing it out there,
Eddie
> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of
> Andreas Groscurth
> Sent: Wednesday, November 29, 2006 4:02 AM
> To: moby-dev at lists.open-bio.org
> Subject: [MOBY-dev] RFC - Synchronization of Biomoby
> secondary repositories
>
> The following text describes the procedure of the
> synchronization of Biomoby secondary repositories.
>
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the
> global set (less services) or added to the global set (more services)
>
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests -populating a
> Moby central from scratch
>
> Solutions:
> -The existing RSS feed is used to notify secondaries of
> changes (register service/delete service/update service) to
> the master -A complete RSS document is created by a new dump
> method for initialization of Moby centrals from scratch
> -Registrations are handled by the client and NOT cascaded
>
> 1. Synchronizing repositories
> =============================
>
> We propose that secondaries check the Biomoby RSS feed to be
> notified whether changes in the registration have been done.
> Currently the RSS feed is updated once a day, for more rapid
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of
> a service/object. If changes were applied to the Biomoby
> Central registry the changes are adopted to the secondary.
> The RSS contains the signature URL where the secondary picks
> up the service RDF to retrieve all details required for the
> registration using the existing RDF agent.
>
> i) Problems/changes required:
>
> The main question here is if unregistered services are
> deleted completely from the central database or are marked as
> inactive. The problem about that is, that the feed would need
> to contain also the information of a deleted service, so that
> the secondaries will retrieve that information. So Moby
> central will have to keep a full transaction log also of deletions.
>
> 2. Filtering
> ============
>
> We propose that any secondary can apply filters to the RSS
> feed and thus only include a subset of all services/objects.
> This can be useful to make finding services from lists
> easier, to tune workflows to performant services, only use
> local services or to exclude test services. Information
> relevant to filtering is in the RSS, like authority,
> description, but maybe more will be relevant, then filtering
> may need to happen at the level of service RDF.
>
> 3. Private services
> ===================
>
> We propose that any client can register services with a Moby
> central secondary, these will then be available only to
> clients querying the secondary. If the secondary is in a
> local network, this allows easy access control to local
> services. Any secondary synchronizing to that repository will
> of course inherit all those additional services, allowing
> simple creation of local production Moby centrals and local
> test Moby centrals.
>
> 4. Registration
> ===============
>
> We propose to NOT cascade registration requests, i.e. pass
> them on from secondary to master. That means that the client
> has control over where a registration is done but also means
> the client has to make that choice. Registration clients must
> thus add an implementation that allows a user to choose the
> Moby central where a service/object should be registered.
> Registration always happens at the topmost Moby central node
> where the service should be visible, all secondaries of this
> Moby central will pick that service up by synchronization.
>
> Why? Cascading registration is cumbersome, as only once a
> registration request has reached the topmost node can name
> duplications etc. be resolved, which must then be passed to
> the client.
>
> Name conflicts can still occur with locally registered services.
> E.g., Adam registers a private service AnalyseThis on a
> private secondary. Later, Beth registers AnalyseThis with
> same authority on the Moby central master. The private
> secondary picks this up from the RSS and runs into a name
> duplication. Proposed solution: Local registrations MUST
> ALWAYS use a local authority. E.g., Adam registers
> AnalyseThis with authority InternalIP, and Beth registers
> AnalyseThis with authority paul_vitti.com. Then, we assume
> whoever registers a service at a more global Moby central
> knows what we're doing and give synchronization precedence
> over local registrations. E.g., a test registry is a
> secondary of Moby central. Chris registers AnalyseThat with
> authority paul_vitti.com in the test registry. Once he's
> happy with testing, he registers AnalyseThat with authority
> paul_vitti.com in Moby central. The test registry retrieves
> this from the RSS, discards the local registration and
> overwrites it with the registration picked up through the RSS.
>
> 5. Moby central failure
> =======================
>
> If a master Moby central fails, the secondaries continue
> normal operation with no effect on service discovery for all
> clients keyed to a secondary. However, registration is no
> longer possible at the master node. Once the master node
> comes back up, all secondaries must resync.
>
> 6. Adaptations to the RSS
> =========================
>
> For this procedure the current RSS feed has to be changed
> marginally, to enable on the one hand the correct
> notification of the secondaries, on the other hand to ensure
> that the normal RSS reader still work the usual way. The
> current RSS feed mainly uses the Dublin Core Metadata to
> provide the information, so to add additional information to
> the feed it is only needed to add more Dublin Core Metadata.
>
> Primarily the feed has to contain the information whether the
> service is new, modified or deleted. Additionally the service
> rdf has to be linked in the feed to enable the local RDF
> agent to apply the changes with the information of the
> service rdf to the local secondary.
> If other additional information shall be added to the feed to
> provide more possibilities to filter the services can be discussed.
>
> 7. Resync
> =========
>
> Another main aspect is the problem if a repository is out of
> sync (e.g. due to a temporary failure of master or
> secondary). The RSS feed has a limited length, which means a
> limited number of transactions are contained. Possibly, this
> will mean it does not contain all transactions since the last
> sync of a secondary.
>
>
> i) Solution
> We propose that each repository will store a time stamp of
> the last synchronization. In case that in the next
> synchronization process the oldest changes in the feed are
> older than the current sync time stamp of the repository, we
> run the risk to not receive all information about service
> changes. In this case the secondary should be able to ask the
> primary to create a RSS feed with all changes which have
> happened since the current time stamp of the secondary.
>
> 8. Initial load
> ===============
>
> When populating a new secondary from scratch, all registered
> services/ objects need to be received from the master Moby
> central. We propose a new method in Moby central to request
> all registered services/ objects as RSS. Then, the
> initialization proceeds exactly like a synchronization.
>
>
>
> So to kick off the discussion here are some of our questions:
>
> 1.Is it reasonable to use the existing RSS feed for this procedure ?
> It sounds very handy and avoids creating a similar complete
> new structure
>
> 2.Does any structure keep track of deleted services ?
>
> 3.Resync: Is it reasonable to timestamp all transactions in
> Moby central? Or should we solve the resync issue by
> enforcing a full drop/ emptying of the secondary and reload
> all data as in initial load?
>
>
> Thanks
> Heiko & Andreas
>
> --
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research Carl-von-Linné-Weg 10
> 50829 Cologne
> Germany
> E-mail: groscurt at mpiz-koeln.mpg.de
> Phone: +49(0)221-5062-447
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
More information about the MOBY-dev
mailing list