From Pieter.Neerincx at wur.nl  Wed Nov  8 05:56:05 2006
From: Pieter.Neerincx at wur.nl (Pieter Neerincx)
Date: Wed, 8 Nov 2006 11:56:05 +0100
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <44E387CB.2080905@ucalgary.ca>
References: <4d93f07c0608160850w68eeb88l185365d679c2edbe@mail.gmail.com>	<44E37AC5.8080105@ucalgary.ca>
	<1155760984.6594.23.camel@bioinfo.icapture.ubc.ca>
	<44E387CB.2080905@ucalgary.ca>
Message-ID: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl>

Hi all,

I'm having a problem with the BioMOBY ping thing. As far as I know  
the services I had registered in the central BioMOBY Central respond  
correctly to a BioMOBY ping request. They are listed as dead though  
on the BioMOBY website and I'm wondering why. The current suspects are:

* Base64 encoded output. Does the agent decode base64 content correctly?
* HTTPS. My services require an https connection. If the agent is  
using Perl code it will probably complain about not being able to  
validate the certificate, but execute anyway. If the agent was  
written in Java it will refuse to execute the service if the SSL  
certificates can not be validated. Our certificates are self-signed,  
so you'd have to add them to your keystore to be able to execute our  
services with a Java client.

My services might need an update to take advantage of LSID resolution  
and the asynchronous one needs to be rewritten for our new BioMOBY  
async services standard, but they are not dead!

Something else: I plan on resuming my SOAP::Lite testing with the  
latest and greatest version. Is there anybody out there who is  
currently successfully running a (patched) S::L version > 0.60?

Cheers,

Pi

From edward.kawas at gmail.com  Wed Nov  8 09:22:42 2006
From: edward.kawas at gmail.com (Edward Kawas)
Date: Wed, 8 Nov 2006 06:22:42 -0800
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl>
Message-ID: <003601c70341$5b74fe60$6d00a8c0@notebook>

Hi Pieter, 

> I'm having a problem with the BioMOBY ping thing. As far as I 
> know the services I had registered in the central BioMOBY 
> Central respond correctly to a BioMOBY ping request. They are 
> listed as dead though on the BioMOBY website and I'm 
> wondering why. The current suspects are:
> 
> * Base64 encoded output. Does the agent decode base64 content 
> correctly?
> * HTTPS. My services require an https connection. If the 
> agent is using Perl code it will probably complain about not 
> being able to validate the certificate, but execute anyway. 
> If the agent was written in Java it will refuse to execute 
> the service if the SSL certificates can not be validated. Our 
> certificates are self-signed, so you'd have to add them to 
> your keystore to be able to execute our services with a Java client.
> 
I bet that its listed as dead because of authentication. What can I do to get
around this?

Thanks,

Eddie


From gordonp at ucalgary.ca  Wed Nov  8 09:56:02 2006
From: gordonp at ucalgary.ca (Paul Gordon)
Date: Wed, 08 Nov 2006 07:56:02 -0700
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <003601c70341$5b74fe60$6d00a8c0@notebook>
References: <003601c70341$5b74fe60$6d00a8c0@notebook>
Message-ID: <4551F002.2040909@ucalgary.ca>

It;'s not an immediate solution, but I would suggest to that people 
providing public SSL services get a real certificate if they can rustle 
up the money (only about $100/year).  I know most MOBY clients won't 
connect to unauthenticated services either, so making it really signed 
by an authority will make it so much more useful...
> Hi Pieter, 
>
>   
>> I'm having a problem with the BioMOBY ping thing. As far as I 
>> know the services I had registered in the central BioMOBY 
>> Central respond correctly to a BioMOBY ping request. They are 
>> listed as dead though on the BioMOBY website and I'm 
>> wondering why. The current suspects are:
>>
>> * Base64 encoded output. Does the agent decode base64 content 
>> correctly?
>> * HTTPS. My services require an https connection. If the 
>> agent is using Perl code it will probably complain about not 
>> being able to validate the certificate, but execute anyway. 
>> If the agent was written in Java it will refuse to execute 
>> the service if the SSL certificates can not be validated. Our 
>> certificates are self-signed, so you'd have to add them to 
>> your keystore to be able to execute our services with a Java client.
>>
>>     
> I bet that its listed as dead because of authentication. What can I do to get
> around this?
>
> Thanks,
>
> Eddie
>
>
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
>
>
>   


From Pieter.Neerincx at wur.nl  Fri Nov 10 05:59:57 2006
From: Pieter.Neerincx at wur.nl (Pieter Neerincx)
Date: Fri, 10 Nov 2006 11:59:57 +0100
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <4551F002.2040909@ucalgary.ca>
References: <003601c70341$5b74fe60$6d00a8c0@notebook>
	<4551F002.2040909@ucalgary.ca>
Message-ID: <C524ABBA-D44F-4C1A-A65E-DE6145AF0413@wur.nl>

Hi Eddie and Paul,

On 8-Nov-2006, at 3:56 PM, Paul Gordon wrote:

> It;'s not an immediate solution, but I would suggest to that people
> providing public SSL services get a real certificate if they can  
> rustle
> up the money (only about $100/year).  I know most MOBY clients won't
> connect to unauthenticated services either, so making it really signed
> by an authority will make it so much more useful...

Ok, I know that a certificate signed by one of the "big" certificate  
authorities would make life a little easier, but our self-signed  
certificates are just as real and valid :). The problem is the  
distribution of the certificates. I would have to drop by at your  
office in person with my passport to prove I am who I claim to be and  
the certificate on for example a USB-stick. If I would send you the  
certificate in a plain e-mail, you can not verify whether it's really  
my certificate or a fake one. Anyway, that distribution problem can  
also be solved without $100.

I'll add some documentation to the site for people who want to use  
HTTPS for their services and/or BioMOBY Central...

Cheers,

Pi


>> Hi Pieter,
>>
>>
>>> I'm having a problem with the BioMOBY ping thing. As far as I
>>> know the services I had registered in the central BioMOBY
>>> Central respond correctly to a BioMOBY ping request. They are
>>> listed as dead though on the BioMOBY website and I'm
>>> wondering why. The current suspects are:
>>>
>>> * Base64 encoded output. Does the agent decode base64 content
>>> correctly?
>>> * HTTPS. My services require an https connection. If the
>>> agent is using Perl code it will probably complain about not
>>> being able to validate the certificate, but execute anyway.
>>> If the agent was written in Java it will refuse to execute
>>> the service if the SSL certificates can not be validated. Our
>>> certificates are self-signed, so you'd have to add them to
>>> your keystore to be able to execute our services with a Java client.
>>>
>>>
>> I bet that its listed as dead because of authentication. What can  
>> I do to get
>> around this?
>>
>> Thanks,
>>
>> Eddie
>>
>>
>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>
>>
>>
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


Wageningen University and Research centre (WUR)
Laboratory of Bioinformatics
Transitorium (building 312) room 1034
Dreijenlaan 3
6703 HA Wageningen
The Netherlands
phone: 0317-483 060
fax: 0317-483 584
mobile: 06-143 66 783
pieter.neerincx at wur.nl


From groscurt at mpiz-koeln.mpg.de  Wed Nov 29 07:02:05 2006
From: groscurt at mpiz-koeln.mpg.de (Andreas Groscurth)
Date: Wed, 29 Nov 2006 13:02:05 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories
Message-ID: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>

The following text describes the procedure of the synchronization of  
Biomoby secondary repositories.

Aim: Replicate BioMoby central
-to create mirrors
-to have redundancy in case of failure
-to create private sets of services, either filtered from the global  
set (less services) or added to the global set (more services)

Problems:
-synchronizing repositories
-cascading service/object registration requests
-populating a Moby central from scratch

Solutions:
-The existing RSS feed is used to notify secondaries of changes  
(register service/delete service/update service) to the master
-A complete RSS document is created by a new dump method for  
initialization of Moby centrals from scratch
-Registrations are handled by the client and NOT cascaded

1. Synchronizing repositories
=============================

We propose that secondaries check the Biomoby RSS feed to be
notified whether changes in the registration have been done.  
Currently the RSS feed is updated once a day, for more rapid  
synchronization this would have to be changed.
The changes include registration, modification or deletion of a  
service/object. If changes were applied to the Biomoby Central
registry the changes are adopted to the secondary. 
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.

i) Problems/changes required:

The main question here is if unregistered services are deleted  
completely from the central database or are marked as inactive. The
problem about that is, that the feed would need to contain also the  
information of a deleted service, so that the secondaries will  
retrieve that information. So Moby central will have to keep a full  
transaction log also of deletions.

2. Filtering
============

We propose that any secondary can apply filters to the RSS feed and  
thus only include a subset of all services/objects. This can be  
useful to make finding services from lists easier, to tune workflows  
to performant services, only use local services or to exclude test  
services. Information relevant to filtering is in the RSS, like  
authority, description, but maybe more will be relevant, then  
filtering may need to happen at the level of service RDF.

3. Private services
===================

We propose that any client can register services with a Moby central  
secondary, these will then be available only to clients querying the  
secondary. If the secondary is in a local network, this allows easy  
access control to local services. Any secondary synchronizing to that  
repository will of course inherit all those additional services,  
allowing simple creation of local production Moby centrals and local  
test Moby centrals.

4. Registration
===============

We propose to NOT cascade registration requests, i.e. pass them on  
from secondary to master. That means that the client has control over  
where a registration is done but also means the client has to make  
that choice. Registration clients must thus add an implementation  
that allows a user to choose the Moby central where a service/object  
should be registered. Registration always happens at the topmost Moby  
central node where the service should be visible, all secondaries of  
this Moby central will pick that service up by synchronization.

Why? Cascading registration is cumbersome, as only once a  
registration request has reached the topmost node can name  
duplications etc. be resolved, which must then be passed to the client.

Name conflicts can still occur with locally registered services.  
E.g., Adam registers a private service AnalyseThis on a private  
secondary. Later, Beth registers AnalyseThis with same authority on  
the Moby central master. The private secondary picks this up from the  
RSS and runs into a name duplication. Proposed solution: Local  
registrations MUST ALWAYS use a local authority. E.g., Adam registers  
AnalyseThis with authority InternalIP, and Beth registers AnalyseThis  
with authority paul_vitti.com. Then, we assume whoever registers a  
service at a more global Moby central knows what we're doing and give  
synchronization precedence over local registrations. E.g., a test  
registry is a secondary of Moby central. Chris registers AnalyseThat  
with authority paul_vitti.com in the test registry. Once he's happy  
with testing, he registers AnalyseThat with authority paul_vitti.com  
in Moby central. The test registry retrieves this from the RSS,  
discards the local registration and overwrites it with the  
registration picked up through the RSS.

5. Moby central failure
=======================

If a master Moby central fails, the secondaries continue normal  
operation with no effect on service discovery for all clients keyed  
to a secondary. However, registration is no longer possible at the  
master node. Once the master node comes back up, all secondaries must  
resync.

6. Adaptations to the RSS
=========================

For this procedure the current RSS feed has to be changed marginally, to
enable on the one hand the correct notification of the secondaries,  
on the other hand to ensure that the normal RSS reader still work the
usual way. The current RSS feed mainly uses the Dublin Core Metadata 
to provide the information, so to add additional information to the 
feed it is only needed to add more Dublin Core Metadata.

Primarily the feed has to contain the information whether the service  
is new, modified or deleted. Additionally the service rdf has to be 
linked in the feed to enable the local RDF agent to apply the changes
with the information of the service rdf to the local secondary. 
If other additional information shall be added to the feed to provide 
more possibilities to filter the services can be discussed.

7. Resync
=========

Another main aspect is the problem if a repository is out of sync  
(e.g. due to a temporary failure of master or secondary). The RSS  
feed has a limited length, which means a limited number of  
transactions are contained. Possibly, this will mean it does not  
contain all transactions since the last sync of a secondary.


i) Solution
We propose that each repository will store a time stamp of  
the last synchronization. In case that
in the next synchronization process the oldest changes in the feed  
are older than the current sync time stamp of the repository, 
we run the risk to not receive all information
about service changes. In this case the secondary should be able to  
ask the primary to create a RSS feed with all changes which have 
happened since the current time stamp of the secondary.

8. Initial load
===============

When populating a new secondary from scratch, all registered services/ 
objects need to be received from the master Moby central. We propose  
a new method in Moby central to request all registered services/ 
objects as RSS. Then, the initialization proceeds exactly like a  
synchronization.


So to kick off the discussion here are some of our questions:

1.Is it reasonable to use the existing RSS feed for this procedure ?  
It sounds very handy and avoids creating a similar complete new structure

2.Does any structure keep track of deleted services ?

3.Resync: Is it reasonable to timestamp all transactions in Moby  
central? Or should we solve the resync issue by enforcing a full drop/ 
emptying of the secondary and reload all data as in initial load?


Thanks
Heiko & Andreas

-- 
Andreas Groscurth
Diplom Bioinformatik - PhD Student
Max Planck Institute for Plant Breeding Research
Carl-von-Linn?-Weg 10
50829 Cologne
Germany
E-mail: ? ?groscurt at mpiz-koeln.mpg.de
Phone: ? ?+49(0)221-5062-447


From dag at sonsorol.org  Wed Nov 29 09:20:46 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 29 Nov 2006 09:20:46 -0500
Subject: [MOBY-dev] question for moby devs/architects regarding use of DNS
Message-ID: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>


Hi folks,

I just installed a new firewall (or in fancy terms 'unified threat  
management appliance' ) upstream of the main open-bio.org servers.

One of the more interesting reports so far is that a number of IP  
addresses have been opening up very large numbers of TCP connections  
to the main open-bio.org web/DNS/mailserver. We are talking about 256 
+ simultaneous TCP sessions heading our way from the same remote IP  
address.

Some of this is just web spidering and FTP mirroring but quite a bit  
of the traffic (oddly enough) is DNS related.

We have an open DNS server and it is quite likely that people have  
found this out and are using us for recursive DNS queries. It is  
actually pretty easy to constrain/lock this down but that DNS server  
is also the primary nameserver for biomoby.org and the very special  
LSID SVR identifier used for LSID discovery operations.

I guess I have the following questions/requests for the moby expert  
community:

(1) In the way that moby is architected is it expected that either  
clients or servers would generate lots of DNS traffic for  
biomoby.org? If what I am seeing is 'normal' then I just want to  
leave things alone.

(2) How popular is LSID? Could services making use of the 'lsid' SVR  
record be responsible for lots of DNS traffic? LIke 256+ sessions   
from the same IP?

(3) I am going to reconfigure the DNS server so that we don't  
recursively answer DNS requests for other domains (like 'cnn.com'  
etc.) while still allowing anyone in the world to query the  
biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
person that I can remain in contact with while we do this work? I  
want to make sure that we don't affect/break moby services while this  
work is done.

Thanks!

-Chris
OBF


From markw at illuminae.com  Wed Nov 29 12:41:40 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:41:40 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regarding	use of DNS
In-Reply-To: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
References: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>

On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote:

> to the main open-bio.org web/DNS/mailserver. We are talking about 256 
> + simultaneous TCP sessions heading our way from the same remote IP  
> address.

I guess the first question is "which IP address?" :-)


> (1) In the way that moby is architected is it expected that either  
> clients or servers would generate lots of DNS traffic for  
> biomoby.org? If what I am seeing is 'normal' then I just want to  
> leave things alone.

We run a cron'd script from our server here that tests all services in
the registry every hour.  I don't know for certain if this is using LSID
resolution as part of that task (Eddie, can you confirm?), but it
wouldn't surprise me if that were the case.  


> (2) How popular is LSID? Could services making use of the 'lsid' SVR  
> record be responsible for lots of DNS traffic? LIke 256+ sessions   
> from the same IP?


We are increasingly using the LSID to represent *all* entities in MOBY -
datatypes, service types, web service instances, etc.  A tool like
Taverna may well be resolving all LSIDs in the MOBY registry each time
it starts-up (?), which could account for the traffic.  Other client
applications will likely use LSID resolution in the same way in the near
future, if they don't already.  Again, the IP address would fairly
quickly tell us whether these are "scientists" or "scriptkiddies".  

Regardless, the use of LSIDs in MOBY is only going to increase over
time, so if it is becoming an issue now we should think about how to
manage it before it becomes a real problem...


> (3) I am going to reconfigure the DNS server so that we don't  
> recursively answer DNS requests for other domains (like 'cnn.com'  
> etc.) while still allowing anyone in the world to query the  
> biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
> person that I can remain in contact with while we do this work?

Eddie Kawas:  ed.kawas at gmail.com

I'm in the lab until tomorrow, and then away for about 10 days in
Germany, so he's the one who will answer your questions most rapidly.


>  I  
> want to make sure that we don't affect/break moby services while this  
> work is done.

:-)  thanks Chris!

Best wishes, 

M

-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From markw at illuminae.com  Wed Nov 29 12:41:40 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:41:40 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regarding	use of DNS
In-Reply-To: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
References: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>

On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote:

> to the main open-bio.org web/DNS/mailserver. We are talking about 256 
> + simultaneous TCP sessions heading our way from the same remote IP  
> address.

I guess the first question is "which IP address?" :-)


> (1) In the way that moby is architected is it expected that either  
> clients or servers would generate lots of DNS traffic for  
> biomoby.org? If what I am seeing is 'normal' then I just want to  
> leave things alone.

We run a cron'd script from our server here that tests all services in
the registry every hour.  I don't know for certain if this is using LSID
resolution as part of that task (Eddie, can you confirm?), but it
wouldn't surprise me if that were the case.  


> (2) How popular is LSID? Could services making use of the 'lsid' SVR  
> record be responsible for lots of DNS traffic? LIke 256+ sessions   
> from the same IP?


We are increasingly using the LSID to represent *all* entities in MOBY -
datatypes, service types, web service instances, etc.  A tool like
Taverna may well be resolving all LSIDs in the MOBY registry each time
it starts-up (?), which could account for the traffic.  Other client
applications will likely use LSID resolution in the same way in the near
future, if they don't already.  Again, the IP address would fairly
quickly tell us whether these are "scientists" or "scriptkiddies".  

Regardless, the use of LSIDs in MOBY is only going to increase over
time, so if it is becoming an issue now we should think about how to
manage it before it becomes a real problem...


> (3) I am going to reconfigure the DNS server so that we don't  
> recursively answer DNS requests for other domains (like 'cnn.com'  
> etc.) while still allowing anyone in the world to query the  
> biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
> person that I can remain in contact with while we do this work?

Eddie Kawas:  ed.kawas at gmail.com

I'm in the lab until tomorrow, and then away for about 10 days in
Germany, so he's the one who will answer your questions most rapidly.


>  I  
> want to make sure that we don't affect/break moby services while this  
> work is done.

:-)  thanks Chris!

Best wishes, 

M

-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From markw at illuminae.com  Wed Nov 29 12:31:11 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:31:11 -0800
Subject: [MOBY-dev] [moby] RFC - Synchronization of Biomoby
	secondary	repositories
In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
Message-ID: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>

Hi Andreas!

Thanks for taking the time to put this document together.  Using the RSS
feed is an interesting idea.  My first instinct is that it might not be
"robust" enough, but I suppose if we spent more time thinking about what
information is passed on that RSS feed it might work quite well!

Have you considered taking advantage of the recent move towards
distributed service signatures?  The RDF Agent is capable of consuming a
list of URLs, recovering the RDF signatures from those URLs, and
rebuilding the entire registry from those RDF documents. It is also a
simple API call to MOBY Central that generates the list of URLs
representing all of the service signatures.  As such, a full mirroring
operation should require nothing more than a single call to the primary
MOBY Central, and passing the result of that call to the RDF agent of
the mirror site and letting it run... Eddie, correct me if that isn't
true...

I'm going to be at your institute this time next week, so let's talk
about it more in person :-)

Best wishes!

Mark


On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: 
> The following text describes the procedure of the synchronization of  
> Biomoby secondary repositories.
> 
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the global  
> set (less services) or added to the global set (more services)
> 
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests
> -populating a Moby central from scratch
> 
> Solutions:
> -The existing RSS feed is used to notify secondaries of changes  
> (register service/delete service/update service) to the master
> -A complete RSS document is created by a new dump method for  
> initialization of Moby centrals from scratch
> -Registrations are handled by the client and NOT cascaded
> 
> 1. Synchronizing repositories
> =============================
> 
> We propose that secondaries check the Biomoby RSS feed to be
> notified whether changes in the registration have been done.  
> Currently the RSS feed is updated once a day, for more rapid  
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of a  
> service/object. If changes were applied to the Biomoby Central
> registry the changes are adopted to the secondary. 
> The RSS contains the signature URL where the secondary picks up
> the service RDF to retrieve all details required for the
> registration using the existing RDF agent.
> 
> i) Problems/changes required:
> 
> The main question here is if unregistered services are deleted  
> completely from the central database or are marked as inactive. The
> problem about that is, that the feed would need to contain also the  
> information of a deleted service, so that the secondaries will  
> retrieve that information. So Moby central will have to keep a full  
> transaction log also of deletions.
> 
> 2. Filtering
> ============
> 
> We propose that any secondary can apply filters to the RSS feed and  
> thus only include a subset of all services/objects. This can be  
> useful to make finding services from lists easier, to tune workflows  
> to performant services, only use local services or to exclude test  
> services. Information relevant to filtering is in the RSS, like  
> authority, description, but maybe more will be relevant, then  
> filtering may need to happen at the level of service RDF.
> 
> 3. Private services
> ===================
> 
> We propose that any client can register services with a Moby central  
> secondary, these will then be available only to clients querying the  
> secondary. If the secondary is in a local network, this allows easy  
> access control to local services. Any secondary synchronizing to that  
> repository will of course inherit all those additional services,  
> allowing simple creation of local production Moby centrals and local  
> test Moby centrals.
> 
> 4. Registration
> ===============
> 
> We propose to NOT cascade registration requests, i.e. pass them on  
> from secondary to master. That means that the client has control over  
> where a registration is done but also means the client has to make  
> that choice. Registration clients must thus add an implementation  
> that allows a user to choose the Moby central where a service/object  
> should be registered. Registration always happens at the topmost Moby  
> central node where the service should be visible, all secondaries of  
> this Moby central will pick that service up by synchronization.
> 
> Why? Cascading registration is cumbersome, as only once a  
> registration request has reached the topmost node can name  
> duplications etc. be resolved, which must then be passed to the client.
> 
> Name conflicts can still occur with locally registered services.  
> E.g., Adam registers a private service AnalyseThis on a private  
> secondary. Later, Beth registers AnalyseThis with same authority on  
> the Moby central master. The private secondary picks this up from the  
> RSS and runs into a name duplication. Proposed solution: Local  
> registrations MUST ALWAYS use a local authority. E.g., Adam registers  
> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis  
> with authority paul_vitti.com. Then, we assume whoever registers a  
> service at a more global Moby central knows what we're doing and give  
> synchronization precedence over local registrations. E.g., a test  
> registry is a secondary of Moby central. Chris registers AnalyseThat  
> with authority paul_vitti.com in the test registry. Once he's happy  
> with testing, he registers AnalyseThat with authority paul_vitti.com  
> in Moby central. The test registry retrieves this from the RSS,  
> discards the local registration and overwrites it with the  
> registration picked up through the RSS.
> 
> 5. Moby central failure
> =======================
> 
> If a master Moby central fails, the secondaries continue normal  
> operation with no effect on service discovery for all clients keyed  
> to a secondary. However, registration is no longer possible at the  
> master node. Once the master node comes back up, all secondaries must  
> resync.
> 
> 6. Adaptations to the RSS
> =========================
> 
> For this procedure the current RSS feed has to be changed marginally, to
> enable on the one hand the correct notification of the secondaries,  
> on the other hand to ensure that the normal RSS reader still work the
> usual way. The current RSS feed mainly uses the Dublin Core Metadata 
> to provide the information, so to add additional information to the 
> feed it is only needed to add more Dublin Core Metadata.
> 
> Primarily the feed has to contain the information whether the service  
> is new, modified or deleted. Additionally the service rdf has to be 
> linked in the feed to enable the local RDF agent to apply the changes
> with the information of the service rdf to the local secondary. 
> If other additional information shall be added to the feed to provide 
> more possibilities to filter the services can be discussed.
> 
> 7. Resync
> =========
> 
> Another main aspect is the problem if a repository is out of sync  
> (e.g. due to a temporary failure of master or secondary). The RSS  
> feed has a limited length, which means a limited number of  
> transactions are contained. Possibly, this will mean it does not  
> contain all transactions since the last sync of a secondary.
> 
> 
> i) Solution
> We propose that each repository will store a time stamp of  
> the last synchronization. In case that
> in the next synchronization process the oldest changes in the feed  
> are older than the current sync time stamp of the repository, 
> we run the risk to not receive all information
> about service changes. In this case the secondary should be able to  
> ask the primary to create a RSS feed with all changes which have 
> happened since the current time stamp of the secondary.
> 
> 8. Initial load
> ===============
> 
> When populating a new secondary from scratch, all registered services/ 
> objects need to be received from the master Moby central. We propose  
> a new method in Moby central to request all registered services/ 
> objects as RSS. Then, the initialization proceeds exactly like a  
> synchronization.
> 
> 
> 
> So to kick off the discussion here are some of our questions:
> 
> 1.Is it reasonable to use the existing RSS feed for this procedure ?  
> It sounds very handy and avoids creating a similar complete new structure
> 
> 2.Does any structure keep track of deleted services ?
> 
> 3.Resync: Is it reasonable to timestamp all transactions in Moby  
> central? Or should we solve the resync issue by enforcing a full drop/ 
> emptying of the secondary and reload all data as in initial load?
> 
> 
> Thanks
> Heiko & Andreas
> 
> -- 
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research
> Carl-von-Linn?-Weg 10
> 50829 Cologne
> Germany
> E-mail:    groscurt at mpiz-koeln.mpg.de
> Phone:    +49(0)221-5062-447
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From ed.kawas at gmail.com  Wed Nov 29 13:03:48 2006
From: ed.kawas at gmail.com (Ed Kawas)
Date: Wed, 29 Nov 2006 10:03:48 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>
Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook>


> We run a cron'd script from our server here that tests all 
> services in the registry every hour.  I don't know for 
> certain if this is using LSID resolution as part of that task 
> (Eddie, can you confirm?), but it wouldn't surprise me if 
> that were the case.  
It does not. Pure api (findservice, etc).

> 
> 
> > (2) How popular is LSID? Could services making use of the 
> 'lsid' SVR  
> > record be responsible for lots of DNS traffic? LIke 256+ sessions   
> > from the same IP?
> 
> 
> We are increasingly using the LSID to represent *all* 
> entities in MOBY - datatypes, service types, web service 
> instances, etc.  A tool like Taverna may well be resolving 
> all LSIDs in the MOBY registry each time it starts-up (?), 
> which could account for the traffic.  Other client 
> applications will likely use LSID resolution in the same way 
> in the near future, if they don't already.  Again, the IP 
> address would fairly quickly tell us whether these are 
> "scientists" or "scriptkiddies".  
> 
> Regardless, the use of LSIDs in MOBY is only going to 
> increase over time, so if it is becoming an issue now we 
> should think about how to manage it before it becomes a real 
> problem...
> 
Mark, your gbrowse_moby application uses lsids a lot. However, those requests
would be from a single ip address.

Eddie


From markw at illuminae.com  Wed Nov 29 13:34:47 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 10:34:47 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook>
References: <002f01c713e0$b9e393d0$6900a8c0@notebook>
Message-ID: <op.tjsi79nqnbznux@hoegaarden.mrl.ubc.ca>

On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas <ed.kawas at gmail.com> wrote:

> Mark, your gbrowse_moby application uses lsids a lot. However, those  
> requests
> would be from a single ip address.

Right... but I don't think it creates 256+ requests at a time, since it is  
a low-throughput interface...  I'd be surprised if gbrowse moby was the  
culprit here.

M


From markw at illuminae.com  Wed Nov 29 13:34:47 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 10:34:47 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook>
References: <002f01c713e0$b9e393d0$6900a8c0@notebook>
Message-ID: <op.tjsi79nqnbznux@hoegaarden.mrl.ubc.ca>

On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas <ed.kawas at gmail.com> wrote:

> Mark, your gbrowse_moby application uses lsids a lot. However, those  
> requests
> would be from a single ip address.

Right... but I don't think it creates 256+ requests at a time, since it is  
a low-throughput interface...  I'd be surprised if gbrowse moby was the  
culprit here.

M


From ed.kawas at gmail.com  Wed Nov 29 13:03:48 2006
From: ed.kawas at gmail.com (Ed Kawas)
Date: Wed, 29 Nov 2006 10:03:48 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>
Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook>


> We run a cron'd script from our server here that tests all 
> services in the registry every hour.  I don't know for 
> certain if this is using LSID resolution as part of that task 
> (Eddie, can you confirm?), but it wouldn't surprise me if 
> that were the case.  
It does not. Pure api (findservice, etc).

> 
> 
> > (2) How popular is LSID? Could services making use of the 
> 'lsid' SVR  
> > record be responsible for lots of DNS traffic? LIke 256+ sessions   
> > from the same IP?
> 
> 
> We are increasingly using the LSID to represent *all* 
> entities in MOBY - datatypes, service types, web service 
> instances, etc.  A tool like Taverna may well be resolving 
> all LSIDs in the MOBY registry each time it starts-up (?), 
> which could account for the traffic.  Other client 
> applications will likely use LSID resolution in the same way 
> in the near future, if they don't already.  Again, the IP 
> address would fairly quickly tell us whether these are 
> "scientists" or "scriptkiddies".  
> 
> Regardless, the use of LSIDs in MOBY is only going to 
> increase over time, so if it is becoming an issue now we 
> should think about how to manage it before it becomes a real 
> problem...
> 
Mark, your gbrowse_moby application uses lsids a lot. However, those requests
would be from a single ip address.

Eddie


From edward.kawas at gmail.com  Wed Nov 29 10:22:14 2006
From: edward.kawas at gmail.com (Edward Kawas)
Date: Wed, 29 Nov 2006 07:22:14 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
Message-ID: <001f01c713ca$27188b70$6900a8c0@notebook>

Hi,

>From reading *just* the 'aim' and 'problems' portion of this message, I was
wondering whether you thought about using the agent for mirroring.

Just throwing it out there,

Eddie

> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org 
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of 
> Andreas Groscurth
> Sent: Wednesday, November 29, 2006 4:02 AM
> To: moby-dev at lists.open-bio.org
> Subject: [MOBY-dev] RFC - Synchronization of Biomoby 
> secondary repositories
> 
> The following text describes the procedure of the 
> synchronization of Biomoby secondary repositories.
> 
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the 
> global set (less services) or added to the global set (more services)
> 
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests -populating a 
> Moby central from scratch
> 
> Solutions:
> -The existing RSS feed is used to notify secondaries of 
> changes (register service/delete service/update service) to 
> the master -A complete RSS document is created by a new dump 
> method for initialization of Moby centrals from scratch 
> -Registrations are handled by the client and NOT cascaded
> 
> 1. Synchronizing repositories
> =============================
> 
> We propose that secondaries check the Biomoby RSS feed to be 
> notified whether changes in the registration have been done.  
> Currently the RSS feed is updated once a day, for more rapid 
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of 
> a service/object. If changes were applied to the Biomoby 
> Central registry the changes are adopted to the secondary. 
> The RSS contains the signature URL where the secondary picks 
> up the service RDF to retrieve all details required for the 
> registration using the existing RDF agent.
> 
> i) Problems/changes required:
> 
> The main question here is if unregistered services are 
> deleted completely from the central database or are marked as 
> inactive. The problem about that is, that the feed would need 
> to contain also the information of a deleted service, so that 
> the secondaries will retrieve that information. So Moby 
> central will have to keep a full transaction log also of deletions.
> 
> 2. Filtering
> ============
> 
> We propose that any secondary can apply filters to the RSS 
> feed and thus only include a subset of all services/objects. 
> This can be useful to make finding services from lists 
> easier, to tune workflows to performant services, only use 
> local services or to exclude test services. Information 
> relevant to filtering is in the RSS, like authority, 
> description, but maybe more will be relevant, then filtering 
> may need to happen at the level of service RDF.
> 
> 3. Private services
> ===================
> 
> We propose that any client can register services with a Moby 
> central secondary, these will then be available only to 
> clients querying the secondary. If the secondary is in a 
> local network, this allows easy access control to local 
> services. Any secondary synchronizing to that repository will 
> of course inherit all those additional services, allowing 
> simple creation of local production Moby centrals and local 
> test Moby centrals.
> 
> 4. Registration
> ===============
> 
> We propose to NOT cascade registration requests, i.e. pass 
> them on from secondary to master. That means that the client 
> has control over where a registration is done but also means 
> the client has to make that choice. Registration clients must 
> thus add an implementation that allows a user to choose the 
> Moby central where a service/object should be registered. 
> Registration always happens at the topmost Moby central node 
> where the service should be visible, all secondaries of this 
> Moby central will pick that service up by synchronization.
> 
> Why? Cascading registration is cumbersome, as only once a 
> registration request has reached the topmost node can name 
> duplications etc. be resolved, which must then be passed to 
> the client.
> 
> Name conflicts can still occur with locally registered services.  
> E.g., Adam registers a private service AnalyseThis on a 
> private secondary. Later, Beth registers AnalyseThis with 
> same authority on the Moby central master. The private 
> secondary picks this up from the RSS and runs into a name 
> duplication. Proposed solution: Local registrations MUST 
> ALWAYS use a local authority. E.g., Adam registers 
> AnalyseThis with authority InternalIP, and Beth registers 
> AnalyseThis with authority paul_vitti.com. Then, we assume 
> whoever registers a service at a more global Moby central 
> knows what we're doing and give synchronization precedence 
> over local registrations. E.g., a test registry is a 
> secondary of Moby central. Chris registers AnalyseThat with 
> authority paul_vitti.com in the test registry. Once he's 
> happy with testing, he registers AnalyseThat with authority 
> paul_vitti.com in Moby central. The test registry retrieves 
> this from the RSS, discards the local registration and 
> overwrites it with the registration picked up through the RSS.
> 
> 5. Moby central failure
> =======================
> 
> If a master Moby central fails, the secondaries continue 
> normal operation with no effect on service discovery for all 
> clients keyed to a secondary. However, registration is no 
> longer possible at the master node. Once the master node 
> comes back up, all secondaries must resync.
> 
> 6. Adaptations to the RSS
> =========================
> 
> For this procedure the current RSS feed has to be changed 
> marginally, to enable on the one hand the correct 
> notification of the secondaries, on the other hand to ensure 
> that the normal RSS reader still work the usual way. The 
> current RSS feed mainly uses the Dublin Core Metadata to 
> provide the information, so to add additional information to 
> the feed it is only needed to add more Dublin Core Metadata.
> 
> Primarily the feed has to contain the information whether the 
> service is new, modified or deleted. Additionally the service 
> rdf has to be linked in the feed to enable the local RDF 
> agent to apply the changes with the information of the 
> service rdf to the local secondary. 
> If other additional information shall be added to the feed to 
> provide more possibilities to filter the services can be discussed.
> 
> 7. Resync
> =========
> 
> Another main aspect is the problem if a repository is out of 
> sync (e.g. due to a temporary failure of master or 
> secondary). The RSS feed has a limited length, which means a 
> limited number of transactions are contained. Possibly, this 
> will mean it does not contain all transactions since the last 
> sync of a secondary.
> 
> 
> i) Solution
> We propose that each repository will store a time stamp of 
> the last synchronization. In case that in the next 
> synchronization process the oldest changes in the feed are 
> older than the current sync time stamp of the repository, we 
> run the risk to not receive all information about service 
> changes. In this case the secondary should be able to ask the 
> primary to create a RSS feed with all changes which have 
> happened since the current time stamp of the secondary.
> 
> 8. Initial load
> ===============
> 
> When populating a new secondary from scratch, all registered 
> services/ objects need to be received from the master Moby 
> central. We propose a new method in Moby central to request 
> all registered services/ objects as RSS. Then, the 
> initialization proceeds exactly like a synchronization.
> 
> 
> 
> So to kick off the discussion here are some of our questions:
> 
> 1.Is it reasonable to use the existing RSS feed for this procedure ?  
> It sounds very handy and avoids creating a similar complete 
> new structure
> 
> 2.Does any structure keep track of deleted services ?
> 
> 3.Resync: Is it reasonable to timestamp all transactions in 
> Moby central? Or should we solve the resync issue by 
> enforcing a full drop/ emptying of the secondary and reload 
> all data as in initial load?
> 
> 
> Thanks
> Heiko & Andreas
> 
> --
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10
> 50829 Cologne
> Germany
> E-mail: ? ?groscurt at mpiz-koeln.mpg.de
> Phone: ? ?+49(0)221-5062-447
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


From markw at illuminae.com  Wed Nov 29 18:52:23 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 15:52:23 -0800
Subject: [MOBY-dev] Holy cow!  Lotsa hits!
Message-ID: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>

852,000 hits on MOBY Central in November.  That's a new record :-)

M


-- 
--
Mark Wilkinson
Assistant Professor, Dept. Medical Genetics
University of British Columbia
PI Bioinformatics
iCAPTURE Centre, St. Paul's Hospital

From schoof at mpiz-koeln.mpg.de  Thu Nov 30 04:39:46 2006
From: schoof at mpiz-koeln.mpg.de (Heiko Schoof)
Date: Thu, 30 Nov 2006 10:39:46 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
Message-ID: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>

Hi Mark, Eddie,

we already use the RDF agent, from the RSS we intend to pull mainly  
the signature URLs, then we propose to use the RDF agent to get all  
data.
---quote---
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.
---/quote---

The advantage of the RSS versus the API call to retrieve ALL  
signature URLs is:
-scalability: If there are 1000s of signature URLs...with the RSS, we  
only retrieve changes
-filtering: ability to filter already based on data in the RSS with  
no need to actually retrieve the service RDF; should improve  
filtering performance as it's one request instead of potentially  
hundreds plus the need to parse all those RDF.

However, for the initialization/from scratch, this method indeed  
makes most sense, we'll modify the RFC accordingly. Is there a  
Biomoby WIKI where we can post that?

Do you intend to come "work" at the MPIZ next week? If yes, when? I'm  
free Thursday afternoon and most of Friday.

Best, Heiko

On 29. Nov 2006, at 18:31 Uhr, Mark Wilkinson wrote:

> Hi Andreas!
>
> Thanks for taking the time to put this document together.  Using  
> the RSS
> feed is an interesting idea.  My first instinct is that it might  
> not be
> "robust" enough, but I suppose if we spent more time thinking about  
> what
> information is passed on that RSS feed it might work quite well!
>
> Have you considered taking advantage of the recent move towards
> distributed service signatures?  The RDF Agent is capable of  
> consuming a
> list of URLs, recovering the RDF signatures from those URLs, and
> rebuilding the entire registry from those RDF documents. It is also a
> simple API call to MOBY Central that generates the list of URLs
> representing all of the service signatures.  As such, a full mirroring
> operation should require nothing more than a single call to the  
> primary
> MOBY Central, and passing the result of that call to the RDF agent of
> the mirror site and letting it run... Eddie, correct me if that isn't
> true...
>
> I'm going to be at your institute this time next week, so let's talk
> about it more in person :-)
>
> Best wishes!
>
> Mark
>
>
>
> On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote:
>> The following text describes the procedure of the synchronization of
>> Biomoby secondary repositories.
>>
>> Aim: Replicate BioMoby central
>> -to create mirrors
>> -to have redundancy in case of failure
>> -to create private sets of services, either filtered from the global
>> set (less services) or added to the global set (more services)
>>
>> Problems:
>> -synchronizing repositories
>> -cascading service/object registration requests
>> -populating a Moby central from scratch
>>
>> Solutions:
>> -The existing RSS feed is used to notify secondaries of changes
>> (register service/delete service/update service) to the master
>> -A complete RSS document is created by a new dump method for
>> initialization of Moby centrals from scratch
>> -Registrations are handled by the client and NOT cascaded
>>
>> 1. Synchronizing repositories
>> =============================
>>
>> We propose that secondaries check the Biomoby RSS feed to be
>> notified whether changes in the registration have been done.
>> Currently the RSS feed is updated once a day, for more rapid
>> synchronization this would have to be changed.
>> The changes include registration, modification or deletion of a
>> service/object. If changes were applied to the Biomoby Central
>> registry the changes are adopted to the secondary.
>> The RSS contains the signature URL where the secondary picks up
>> the service RDF to retrieve all details required for the
>> registration using the existing RDF agent.
>>
>> i) Problems/changes required:
>>
>> The main question here is if unregistered services are deleted
>> completely from the central database or are marked as inactive. The
>> problem about that is, that the feed would need to contain also the
>> information of a deleted service, so that the secondaries will
>> retrieve that information. So Moby central will have to keep a full
>> transaction log also of deletions.
>>
>> 2. Filtering
>> ============
>>
>> We propose that any secondary can apply filters to the RSS feed and
>> thus only include a subset of all services/objects. This can be
>> useful to make finding services from lists easier, to tune workflows
>> to performant services, only use local services or to exclude test
>> services. Information relevant to filtering is in the RSS, like
>> authority, description, but maybe more will be relevant, then
>> filtering may need to happen at the level of service RDF.
>>
>> 3. Private services
>> ===================
>>
>> We propose that any client can register services with a Moby central
>> secondary, these will then be available only to clients querying the
>> secondary. If the secondary is in a local network, this allows easy
>> access control to local services. Any secondary synchronizing to that
>> repository will of course inherit all those additional services,
>> allowing simple creation of local production Moby centrals and local
>> test Moby centrals.
>>
>> 4. Registration
>> ===============
>>
>> We propose to NOT cascade registration requests, i.e. pass them on
>> from secondary to master. That means that the client has control over
>> where a registration is done but also means the client has to make
>> that choice. Registration clients must thus add an implementation
>> that allows a user to choose the Moby central where a service/object
>> should be registered. Registration always happens at the topmost Moby
>> central node where the service should be visible, all secondaries of
>> this Moby central will pick that service up by synchronization.
>>
>> Why? Cascading registration is cumbersome, as only once a
>> registration request has reached the topmost node can name
>> duplications etc. be resolved, which must then be passed to the  
>> client.
>>
>> Name conflicts can still occur with locally registered services.
>> E.g., Adam registers a private service AnalyseThis on a private
>> secondary. Later, Beth registers AnalyseThis with same authority on
>> the Moby central master. The private secondary picks this up from the
>> RSS and runs into a name duplication. Proposed solution: Local
>> registrations MUST ALWAYS use a local authority. E.g., Adam registers
>> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis
>> with authority paul_vitti.com. Then, we assume whoever registers a
>> service at a more global Moby central knows what we're doing and give
>> synchronization precedence over local registrations. E.g., a test
>> registry is a secondary of Moby central. Chris registers AnalyseThat
>> with authority paul_vitti.com in the test registry. Once he's happy
>> with testing, he registers AnalyseThat with authority paul_vitti.com
>> in Moby central. The test registry retrieves this from the RSS,
>> discards the local registration and overwrites it with the
>> registration picked up through the RSS.
>>
>> 5. Moby central failure
>> =======================
>>
>> If a master Moby central fails, the secondaries continue normal
>> operation with no effect on service discovery for all clients keyed
>> to a secondary. However, registration is no longer possible at the
>> master node. Once the master node comes back up, all secondaries must
>> resync.
>>
>> 6. Adaptations to the RSS
>> =========================
>>
>> For this procedure the current RSS feed has to be changed  
>> marginally, to
>> enable on the one hand the correct notification of the secondaries,
>> on the other hand to ensure that the normal RSS reader still work the
>> usual way. The current RSS feed mainly uses the Dublin Core Metadata
>> to provide the information, so to add additional information to the
>> feed it is only needed to add more Dublin Core Metadata.
>>
>> Primarily the feed has to contain the information whether the service
>> is new, modified or deleted. Additionally the service rdf has to be
>> linked in the feed to enable the local RDF agent to apply the changes
>> with the information of the service rdf to the local secondary.
>> If other additional information shall be added to the feed to provide
>> more possibilities to filter the services can be discussed.
>>
>> 7. Resync
>> =========
>>
>> Another main aspect is the problem if a repository is out of sync
>> (e.g. due to a temporary failure of master or secondary). The RSS
>> feed has a limited length, which means a limited number of
>> transactions are contained. Possibly, this will mean it does not
>> contain all transactions since the last sync of a secondary.
>>
>>
>> i) Solution
>> We propose that each repository will store a time stamp of
>> the last synchronization. In case that
>> in the next synchronization process the oldest changes in the feed
>> are older than the current sync time stamp of the repository,
>> we run the risk to not receive all information
>> about service changes. In this case the secondary should be able to
>> ask the primary to create a RSS feed with all changes which have
>> happened since the current time stamp of the secondary.
>>
>> 8. Initial load
>> ===============
>>
>> When populating a new secondary from scratch, all registered  
>> services/
>> objects need to be received from the master Moby central. We propose
>> a new method in Moby central to request all registered services/
>> objects as RSS. Then, the initialization proceeds exactly like a
>> synchronization.
>>
>>
>>
>> So to kick off the discussion here are some of our questions:
>>
>> 1.Is it reasonable to use the existing RSS feed for this procedure ?
>> It sounds very handy and avoids creating a similar complete new  
>> structure
>>
>> 2.Does any structure keep track of deleted services ?
>>
>> 3.Resync: Is it reasonable to timestamp all transactions in Moby
>> central? Or should we solve the resync issue by enforcing a full  
>> drop/
>> emptying of the secondary and reload all data as in initial load?
>>
>>
>> Thanks
>> Heiko & Andreas
>>
>> -- 
>> Andreas Groscurth
>> Diplom Bioinformatik - PhD Student
>> Max Planck Institute for Plant Breeding Research
>> Carl-von-Linn?-Weg 10
>> 50829 Cologne
>> Germany
>> E-mail:    groscurt at mpiz-koeln.mpg.de
>> Phone:    +49(0)221-5062-447
>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> -- 
> Mark Wilkinson
> Asst. Professor, Dept. of Medical Genetics
> University of British Columbia
> PI in Bioinformatics, iCAPTURE Centre
> St. Paul's Hospital, Rm. 166, 1081 Burrard St.
> Vancouver, BC, V6Z 1Y6
> tel: 604 682 2344 x62129
> fax: 604 806 9274
>
> "Scientists would rather share their toothbrush than their data"
>                                         - Carole Goble
>
>                          ==================
>
>
> ***CONFIDENTIALITY NOTICE***
> This electronic message is intended only for the use of the addressee
> and may contain information that is privileged and confidential.  Any
> dissemination, distribution or copying of this communication by
> unauthorized individuals is strictly prohibited. If you have received
> this communication in error, please notify the sender immediately by
> reply e-mail and delete the original and all copies from your system.
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


From dgonzalez at cnio.es  Thu Nov 30 09:43:20 2006
From: dgonzalez at cnio.es (David G. Pisano)
Date: Thu, 30 Nov 2006 15:43:20 +0100
Subject: [MOBY-dev] Holy cow!  Lotsa hits!
In-Reply-To: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
Message-ID: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>

Mark,

Do you have historic records? It would be nice for presentations ;)
(and grats to everybody, by the way)

David

On  30 Nov, 2006, at 12:52 AM, Mark Wilkinson wrote:

> 852,000 hits on MOBY Central in November.  That's a new record :-)
>
> M
>
>
> -- 
> --
> Mark Wilkinson
> Assistant Professor, Dept. Medical Genetics
> University of British Columbia
> PI Bioinformatics
> iCAPTURE Centre, St. Paul's Hospital
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.


From markw at illuminae.com  Thu Nov 30 11:40:28 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 30 Nov 2006 08:40:28 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
Message-ID: <op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>


> we already use the RDF agent, from the RSS we intend to pull mainly the  
> signature URLs, then we propose to use the RDF agent to get all data.

Ah!  I see.


> The advantage of the RSS versus the API call to retrieve ALL signature  
> URLs is:
> -scalability: If there are 1000s of signature URLs...with the RSS, we  
> only retrieve changes

I would need to modify the RSS feed such that it reports additions *and*  
removals (right now it is just additions), and we would have to come up  
with a formal way of representing these...  and then the RSS functionality  
and features would need to become a formal part of the MOBY API (I can  
just hear Dr. Senger yelling at us right now that we're considering  
building core functionality on parts of MOBY that are entirely  
undocumented ;-)  ;-) ).  I guess that's why I am so hesitant to use RSS.

However, I guess so long as this is not a "recommended" practice, only a  
short-cut; and as long as it is *always* possible to use a true API call  
to mirror the registry, and we formally say what the recommended  
best-practice is, then it's reasonable to have a non-guaranteed  
alternative that is more lightweight.


> However, for the initialization/from scratch, this method indeed makes  
> most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI  
> where we can post that?

There currently is no Wiki running on BioMoby.  The last Wiki we had was  
hacked, and we just never brought it back up again.  We tried using  
Bugzilla as a way of tracking RFC's, but that didn't make many people very  
happy, so we're simply using the mailing list, with some formal write-up  
in an attachment.


> Do you intend to come "work" at the MPIZ next week?

Do I have to answer that publicly?  ;-)

The answer is "Yes".  We should get a variety of things - MOBY-wise and  
otherwise - sorted out between us while I am there.  I'm free all day  
Thursday and Friday, so that should give us plenty of time.

Cheers!

M

 
From schoof at mpiz-koeln.mpg.de  Thu Nov 30 13:01:48 2006
From: schoof at mpiz-koeln.mpg.de (Heiko Schoof)
Date: Thu, 30 Nov 2006 19:01:48 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
	<op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
Message-ID: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>


On 30. Nov 2006, at 17:40 Uhr, Mark Wilkinson wrote:

> and then the RSS functionality and features would need to become a  
> formal part of the MOBY API (I can just hear Dr. Senger yelling at  
> us right now that we're considering building core functionality on  
> parts of MOBY that are entirely undocumented ;-)  ;-) ).  I guess  
> that's why I am so hesitant to use RSS.
>
> However, I guess so long as this is not a "recommended" practice,  
> only a short-cut; and as long as it is *always* possible to use a  
> true API call to mirror the registry, and we formally say what the  
> recommended best-practice is, then it's reasonable to have a non- 
> guaranteed alternative that is more lightweight.
>
What we are proposing is to make new RSS functionality that will be  
part of the core API. It just so happens that RSS and the surrounding  
toolkit is well suited to the purpose, and more fitting to Moby than  
other solutions we've looked at. Why should we make a new API call  
that spews out some custom XML if we can perfectly use RSS within its  
specs and get a core RSS feed for "human"/aggregator consumption at  
the same time for free? We stated that we will need to modify the  
RSS, though not breaking anything as far as we can see.

We were not proposing to use RSS just because there's existing  
functionality ;-) we're not quite THAT lazy...though almost. And...  
isn't it *cool* to use RSS for some real work?

What is a true API call? Why is a call to the RSS feed not a true API  
call, if we make it part of the API? RSS is a tested, scaleable  
technology, which is why we propose to use it, as we envision  
hundreds of Moby clients maintaining their local cache (like  
Dashboard) through that functionality. Which is one thing we haven't  
mentioned yet, caching of Moby central for clients could easily build  
on the Moby secondary functionality, we think. But maybe Martin or  
others with experience on Moby central caching should comment on that.

Heiko

From markw at illuminae.com  Thu Nov 30 14:10:08 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 30 Nov 2006 11:10:08 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
	<op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
	<93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>
Message-ID: <op.tjufi6hunbznux@hoegaarden.mrl.ubc.ca>

Hi Heiko,

> Why should we make a new API call that spews out some custom XML if we  
> can perfectly use RSS within its specs and get a core RSS feed for  
> "human"/aggregator consumption at the same time for free? We stated that  
> we will need to modify the RSS, though not breaking anything as far as  
> we can see.

I think we are effectively saying the same thing; I am not suggesting that  
we make a new API call, I'm suggesting that there are API call's that  
exist already that could be used for this purpose, albeit not so  
conveniently as your RSS suggestion.


> We were not proposing to use RSS just because there's existing  
> functionality ;-) we're not quite THAT lazy...though almost. And...  
> isn't it *cool* to use RSS for some real work?

Well... I guess this is the issue.  You're proposing to use RSS for a  
purpose for which it was not (IMO) designed.  As such, we would have to  
create new conventions around the RSS feed (hereafter called MOBY-RSS)  
that may or may not be more widely accepted in the world.  I agree 100%  
that it would be VERY cool to use RSS in this way, but v.v. a robust  
solution to the problem I'm not entirely convinced.  The amount of RSS-RDF  
we would have to maintain on MOBY Central in order to have a complete  
history that would allow a mirror to reliably re-construct the current  
state of the database is... well... large!  At the moment, I keep only the  
last... 100?... changes.  If you don't pick-up the feed for a day, or if  
someone registers 1000 new services, you wont see them in the feed.  To be  
safe, we would have to keep *all* changes in the RDF document at MOBY  
Central, in which case the overhead of calling the feed versus using the  
MOBY Central API would be about the same.

I'm not *opposed* to the idea of using RSS, and I agree that it is a novel  
and "cool" use for it, but I am concerned that we will perpetuate the MOBY  
reputation of making ad hoc decisions around other standards... (which  
isn't necessarily BAD, it just gives us a reputation for being maveriks,  
which angers the reviewers :-) )

Let's talk about it over a Koelsch (or two) next week!

M


-- 
--
Mark Wilkinson
Assistant Professor, Dept. Medical Genetics
University of British Columbia
PI Bioinformatics
iCAPTURE Centre, St. Paul's Hospital

***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee and  
may contain information that is privileged and confidential.  Any  
dissemination, distribution or copying of this communication by  
unauthorized individuals is strictly prohibited. If you have received this  
communication in error, please notify the sender immediately by reply  
e-mail and delete the original and all copies from your system.
 

From martin.senger at gmail.com  Thu Nov 30 10:08:37 2006
From: martin.senger at gmail.com (Martin Senger)
Date: Thu, 30 Nov 2006 15:08:37 +0000
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
Message-ID: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>

Well, let me be more realistic. I do not like so called "Potemkin's"
villages (see google if it does not ring any bell). Of course, I am glad
that BioMoby is growing, and I will support it in any possible way. And I
also understand that some facts are good for PR and for funding agencies.

But... For us, we should be more precise what these hits actually mean. For
example, I assume (please correct me if I am wrong) that every time somebody
updates her local cash from Dashboard, it increases the hit numbers. Also,
all these automated tools may influence how many times the registry is
accessed. Which gives us a distorted picture.

Better, possibly, would be to agree to an HTTP agent name (or names) that we
can use in this automatic tools - and to separate in the statistics *all*
hits (good for funding agencies) from the other hits where we do not include
requests from this (or these) HTTP agent(s).

Just my "c's,
Martin

-- 
Martin Senger
   email: martin.senger at gmail.com
   skype: martinsenger

From jmfernandez at cnio.es  Thu Nov 30 16:06:53 2006
From: jmfernandez at cnio.es (=?ISO-8859-1?Q?Jos=E9_Mar=EDa_Fern=E1ndez_Gonz=E1lez?=)
Date: Thu, 30 Nov 2006 22:06:53 +0100
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
	<4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
Message-ID: <456F47ED.1060800@cnio.es>

You should have some way to distinguish automated tools entries from the other 
(with some help from the tool developers, of course). As every request is done 
through HTTP, you could use User Agent signature recorded in Apache logs. For 
instance, each tool could use a different 'User Agent' variant, so they could 
be distinguished, or if a program/tool is going can issue requests related to 
maintenance, it should be advisable to alter its 'User Agent' signature in 
some way based on their mode.

	Just my 2 euro-cents.
		Jos? Mar?a

Martin Senger wrote:
> Well, let me be more realistic. I do not like so called "Potemkin's"
> villages (see google if it does not ring any bell). Of course, I am glad
> that BioMoby is growing, and I will support it in any possible way. And I
> also understand that some facts are good for PR and for funding agencies.
> 
> But... For us, we should be more precise what these hits actually mean. For
> example, I assume (please correct me if I am wrong) that every time somebody
> updates her local cash from Dashboard, it increases the hit numbers. Also,
> all these automated tools may influence how many times the registry is
> accessed. Which gives us a distorted picture.
> 
> Better, possibly, would be to agree to an HTTP agent name (or names) that we
> can use in this automatic tools - and to separate in the statistics *all*
> hits (good for funding agencies) from the other hits where we do not include
> requests from this (or these) HTTP agent(s).
> 
> Just my "c's,
> Martin
> 

-- 
Jos? Mar?a Fern?ndez Gonz?lez
Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 2256)
e-mail: jmfernandez at cnio.es		Fax: (+34) 91 224 69 76
Biolog?a Estructural y Bioinform?tica	Structural Biology and Bioinformatics
Centro Nacional de Investigaciones Oncol?gicas
C.P.: 28029				Zip Code: 28029
C/. Melchor Fern?ndez Almagro, 3	Madrid (Spain)

**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.


From martin.senger at gmail.com  Thu Nov 30 16:48:08 2006
From: martin.senger at gmail.com (Martin Senger)
Date: Thu, 30 Nov 2006 21:48:08 +0000
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <456F47ED.1060800@cnio.es>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
	<4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
	<456F47ED.1060800@cnio.es>
Message-ID: <4d93f07c0611301348v1289a6a3u13139a49a6d55678@mail.gmail.com>

> you could use User Agent signature


Well, that's what I said :-)
Martin


-- 
Martin Senger
   email: martin.senger at gmail.com
   skype: martinsenger

From Pieter.Neerincx at wur.nl  Wed Nov  8 10:56:05 2006
From: Pieter.Neerincx at wur.nl (Pieter Neerincx)
Date: Wed, 8 Nov 2006 11:56:05 +0100
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <44E387CB.2080905@ucalgary.ca>
References: <4d93f07c0608160850w68eeb88l185365d679c2edbe@mail.gmail.com>	<44E37AC5.8080105@ucalgary.ca>
	<1155760984.6594.23.camel@bioinfo.icapture.ubc.ca>
	<44E387CB.2080905@ucalgary.ca>
Message-ID: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl>

Hi all,

I'm having a problem with the BioMOBY ping thing. As far as I know  
the services I had registered in the central BioMOBY Central respond  
correctly to a BioMOBY ping request. They are listed as dead though  
on the BioMOBY website and I'm wondering why. The current suspects are:

* Base64 encoded output. Does the agent decode base64 content correctly?
* HTTPS. My services require an https connection. If the agent is  
using Perl code it will probably complain about not being able to  
validate the certificate, but execute anyway. If the agent was  
written in Java it will refuse to execute the service if the SSL  
certificates can not be validated. Our certificates are self-signed,  
so you'd have to add them to your keystore to be able to execute our  
services with a Java client.

My services might need an update to take advantage of LSID resolution  
and the asynchronous one needs to be rewritten for our new BioMOBY  
async services standard, but they are not dead!

Something else: I plan on resuming my SOAP::Lite testing with the  
latest and greatest version. Is there anybody out there who is  
currently successfully running a (patched) S::L version > 0.60?

Cheers,

Pi


From edward.kawas at gmail.com  Wed Nov  8 14:22:42 2006
From: edward.kawas at gmail.com (Edward Kawas)
Date: Wed, 8 Nov 2006 06:22:42 -0800
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl>
Message-ID: <003601c70341$5b74fe60$6d00a8c0@notebook>

Hi Pieter, 

> I'm having a problem with the BioMOBY ping thing. As far as I 
> know the services I had registered in the central BioMOBY 
> Central respond correctly to a BioMOBY ping request. They are 
> listed as dead though on the BioMOBY website and I'm 
> wondering why. The current suspects are:
> 
> * Base64 encoded output. Does the agent decode base64 content 
> correctly?
> * HTTPS. My services require an https connection. If the 
> agent is using Perl code it will probably complain about not 
> being able to validate the certificate, but execute anyway. 
> If the agent was written in Java it will refuse to execute 
> the service if the SSL certificates can not be validated. Our 
> certificates are self-signed, so you'd have to add them to 
> your keystore to be able to execute our services with a Java client.
> 
I bet that its listed as dead because of authentication. What can I do to get
around this?

Thanks,

Eddie


From gordonp at ucalgary.ca  Wed Nov  8 14:56:02 2006
From: gordonp at ucalgary.ca (Paul Gordon)
Date: Wed, 08 Nov 2006 07:56:02 -0700
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <003601c70341$5b74fe60$6d00a8c0@notebook>
References: <003601c70341$5b74fe60$6d00a8c0@notebook>
Message-ID: <4551F002.2040909@ucalgary.ca>

It;'s not an immediate solution, but I would suggest to that people 
providing public SSL services get a real certificate if they can rustle 
up the money (only about $100/year).  I know most MOBY clients won't 
connect to unauthenticated services either, so making it really signed 
by an authority will make it so much more useful...
> Hi Pieter, 
>
>   
>> I'm having a problem with the BioMOBY ping thing. As far as I 
>> know the services I had registered in the central BioMOBY 
>> Central respond correctly to a BioMOBY ping request. They are 
>> listed as dead though on the BioMOBY website and I'm 
>> wondering why. The current suspects are:
>>
>> * Base64 encoded output. Does the agent decode base64 content 
>> correctly?
>> * HTTPS. My services require an https connection. If the 
>> agent is using Perl code it will probably complain about not 
>> being able to validate the certificate, but execute anyway. 
>> If the agent was written in Java it will refuse to execute 
>> the service if the SSL certificates can not be validated. Our 
>> certificates are self-signed, so you'd have to add them to 
>> your keystore to be able to execute our services with a Java client.
>>
>>     
> I bet that its listed as dead because of authentication. What can I do to get
> around this?
>
> Thanks,
>
> Eddie
>
>
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
>
>
>   


From Pieter.Neerincx at wur.nl  Fri Nov 10 10:59:57 2006
From: Pieter.Neerincx at wur.nl (Pieter Neerincx)
Date: Fri, 10 Nov 2006 11:59:57 +0100
Subject: [MOBY-dev] BioMOBY ping
In-Reply-To: <4551F002.2040909@ucalgary.ca>
References: <003601c70341$5b74fe60$6d00a8c0@notebook>
	<4551F002.2040909@ucalgary.ca>
Message-ID: <C524ABBA-D44F-4C1A-A65E-DE6145AF0413@wur.nl>

Hi Eddie and Paul,

On 8-Nov-2006, at 3:56 PM, Paul Gordon wrote:

> It;'s not an immediate solution, but I would suggest to that people
> providing public SSL services get a real certificate if they can  
> rustle
> up the money (only about $100/year).  I know most MOBY clients won't
> connect to unauthenticated services either, so making it really signed
> by an authority will make it so much more useful...

Ok, I know that a certificate signed by one of the "big" certificate  
authorities would make life a little easier, but our self-signed  
certificates are just as real and valid :). The problem is the  
distribution of the certificates. I would have to drop by at your  
office in person with my passport to prove I am who I claim to be and  
the certificate on for example a USB-stick. If I would send you the  
certificate in a plain e-mail, you can not verify whether it's really  
my certificate or a fake one. Anyway, that distribution problem can  
also be solved without $100.

I'll add some documentation to the site for people who want to use  
HTTPS for their services and/or BioMOBY Central...

Cheers,

Pi


>> Hi Pieter,
>>
>>
>>> I'm having a problem with the BioMOBY ping thing. As far as I
>>> know the services I had registered in the central BioMOBY
>>> Central respond correctly to a BioMOBY ping request. They are
>>> listed as dead though on the BioMOBY website and I'm
>>> wondering why. The current suspects are:
>>>
>>> * Base64 encoded output. Does the agent decode base64 content
>>> correctly?
>>> * HTTPS. My services require an https connection. If the
>>> agent is using Perl code it will probably complain about not
>>> being able to validate the certificate, but execute anyway.
>>> If the agent was written in Java it will refuse to execute
>>> the service if the SSL certificates can not be validated. Our
>>> certificates are self-signed, so you'd have to add them to
>>> your keystore to be able to execute our services with a Java client.
>>>
>>>
>> I bet that its listed as dead because of authentication. What can  
>> I do to get
>> around this?
>>
>> Thanks,
>>
>> Eddie
>>
>>
>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>
>>
>>
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


Wageningen University and Research centre (WUR)
Laboratory of Bioinformatics
Transitorium (building 312) room 1034
Dreijenlaan 3
6703 HA Wageningen
The Netherlands
phone: 0317-483 060
fax: 0317-483 584
mobile: 06-143 66 783
pieter.neerincx at wur.nl


From groscurt at mpiz-koeln.mpg.de  Wed Nov 29 12:02:05 2006
From: groscurt at mpiz-koeln.mpg.de (Andreas Groscurth)
Date: Wed, 29 Nov 2006 13:02:05 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories
Message-ID: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>

The following text describes the procedure of the synchronization of  
Biomoby secondary repositories.

Aim: Replicate BioMoby central
-to create mirrors
-to have redundancy in case of failure
-to create private sets of services, either filtered from the global  
set (less services) or added to the global set (more services)

Problems:
-synchronizing repositories
-cascading service/object registration requests
-populating a Moby central from scratch

Solutions:
-The existing RSS feed is used to notify secondaries of changes  
(register service/delete service/update service) to the master
-A complete RSS document is created by a new dump method for  
initialization of Moby centrals from scratch
-Registrations are handled by the client and NOT cascaded

1. Synchronizing repositories
=============================

We propose that secondaries check the Biomoby RSS feed to be
notified whether changes in the registration have been done.  
Currently the RSS feed is updated once a day, for more rapid  
synchronization this would have to be changed.
The changes include registration, modification or deletion of a  
service/object. If changes were applied to the Biomoby Central
registry the changes are adopted to the secondary. 
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.

i) Problems/changes required:

The main question here is if unregistered services are deleted  
completely from the central database or are marked as inactive. The
problem about that is, that the feed would need to contain also the  
information of a deleted service, so that the secondaries will  
retrieve that information. So Moby central will have to keep a full  
transaction log also of deletions.

2. Filtering
============

We propose that any secondary can apply filters to the RSS feed and  
thus only include a subset of all services/objects. This can be  
useful to make finding services from lists easier, to tune workflows  
to performant services, only use local services or to exclude test  
services. Information relevant to filtering is in the RSS, like  
authority, description, but maybe more will be relevant, then  
filtering may need to happen at the level of service RDF.

3. Private services
===================

We propose that any client can register services with a Moby central  
secondary, these will then be available only to clients querying the  
secondary. If the secondary is in a local network, this allows easy  
access control to local services. Any secondary synchronizing to that  
repository will of course inherit all those additional services,  
allowing simple creation of local production Moby centrals and local  
test Moby centrals.

4. Registration
===============

We propose to NOT cascade registration requests, i.e. pass them on  
from secondary to master. That means that the client has control over  
where a registration is done but also means the client has to make  
that choice. Registration clients must thus add an implementation  
that allows a user to choose the Moby central where a service/object  
should be registered. Registration always happens at the topmost Moby  
central node where the service should be visible, all secondaries of  
this Moby central will pick that service up by synchronization.

Why? Cascading registration is cumbersome, as only once a  
registration request has reached the topmost node can name  
duplications etc. be resolved, which must then be passed to the client.

Name conflicts can still occur with locally registered services.  
E.g., Adam registers a private service AnalyseThis on a private  
secondary. Later, Beth registers AnalyseThis with same authority on  
the Moby central master. The private secondary picks this up from the  
RSS and runs into a name duplication. Proposed solution: Local  
registrations MUST ALWAYS use a local authority. E.g., Adam registers  
AnalyseThis with authority InternalIP, and Beth registers AnalyseThis  
with authority paul_vitti.com. Then, we assume whoever registers a  
service at a more global Moby central knows what we're doing and give  
synchronization precedence over local registrations. E.g., a test  
registry is a secondary of Moby central. Chris registers AnalyseThat  
with authority paul_vitti.com in the test registry. Once he's happy  
with testing, he registers AnalyseThat with authority paul_vitti.com  
in Moby central. The test registry retrieves this from the RSS,  
discards the local registration and overwrites it with the  
registration picked up through the RSS.

5. Moby central failure
=======================

If a master Moby central fails, the secondaries continue normal  
operation with no effect on service discovery for all clients keyed  
to a secondary. However, registration is no longer possible at the  
master node. Once the master node comes back up, all secondaries must  
resync.

6. Adaptations to the RSS
=========================

For this procedure the current RSS feed has to be changed marginally, to
enable on the one hand the correct notification of the secondaries,  
on the other hand to ensure that the normal RSS reader still work the
usual way. The current RSS feed mainly uses the Dublin Core Metadata 
to provide the information, so to add additional information to the 
feed it is only needed to add more Dublin Core Metadata.

Primarily the feed has to contain the information whether the service  
is new, modified or deleted. Additionally the service rdf has to be 
linked in the feed to enable the local RDF agent to apply the changes
with the information of the service rdf to the local secondary. 
If other additional information shall be added to the feed to provide 
more possibilities to filter the services can be discussed.

7. Resync
=========

Another main aspect is the problem if a repository is out of sync  
(e.g. due to a temporary failure of master or secondary). The RSS  
feed has a limited length, which means a limited number of  
transactions are contained. Possibly, this will mean it does not  
contain all transactions since the last sync of a secondary.


i) Solution
We propose that each repository will store a time stamp of  
the last synchronization. In case that
in the next synchronization process the oldest changes in the feed  
are older than the current sync time stamp of the repository, 
we run the risk to not receive all information
about service changes. In this case the secondary should be able to  
ask the primary to create a RSS feed with all changes which have 
happened since the current time stamp of the secondary.

8. Initial load
===============

When populating a new secondary from scratch, all registered services/ 
objects need to be received from the master Moby central. We propose  
a new method in Moby central to request all registered services/ 
objects as RSS. Then, the initialization proceeds exactly like a  
synchronization.


So to kick off the discussion here are some of our questions:

1.Is it reasonable to use the existing RSS feed for this procedure ?  
It sounds very handy and avoids creating a similar complete new structure

2.Does any structure keep track of deleted services ?

3.Resync: Is it reasonable to timestamp all transactions in Moby  
central? Or should we solve the resync issue by enforcing a full drop/ 
emptying of the secondary and reload all data as in initial load?


Thanks
Heiko & Andreas

-- 
Andreas Groscurth
Diplom Bioinformatik - PhD Student
Max Planck Institute for Plant Breeding Research
Carl-von-Linn?-Weg 10
50829 Cologne
Germany
E-mail: ? ?groscurt at mpiz-koeln.mpg.de
Phone: ? ?+49(0)221-5062-447


From dag at sonsorol.org  Wed Nov 29 14:20:46 2006
From: dag at sonsorol.org (Chris Dagdigian)
Date: Wed, 29 Nov 2006 09:20:46 -0500
Subject: [MOBY-dev] question for moby devs/architects regarding use of DNS
Message-ID: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>


Hi folks,

I just installed a new firewall (or in fancy terms 'unified threat  
management appliance' ) upstream of the main open-bio.org servers.

One of the more interesting reports so far is that a number of IP  
addresses have been opening up very large numbers of TCP connections  
to the main open-bio.org web/DNS/mailserver. We are talking about 256 
+ simultaneous TCP sessions heading our way from the same remote IP  
address.

Some of this is just web spidering and FTP mirroring but quite a bit  
of the traffic (oddly enough) is DNS related.

We have an open DNS server and it is quite likely that people have  
found this out and are using us for recursive DNS queries. It is  
actually pretty easy to constrain/lock this down but that DNS server  
is also the primary nameserver for biomoby.org and the very special  
LSID SVR identifier used for LSID discovery operations.

I guess I have the following questions/requests for the moby expert  
community:

(1) In the way that moby is architected is it expected that either  
clients or servers would generate lots of DNS traffic for  
biomoby.org? If what I am seeing is 'normal' then I just want to  
leave things alone.

(2) How popular is LSID? Could services making use of the 'lsid' SVR  
record be responsible for lots of DNS traffic? LIke 256+ sessions   
from the same IP?

(3) I am going to reconfigure the DNS server so that we don't  
recursively answer DNS requests for other domains (like 'cnn.com'  
etc.) while still allowing anyone in the world to query the  
biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
person that I can remain in contact with while we do this work? I  
want to make sure that we don't affect/break moby services while this  
work is done.

Thanks!

-Chris
OBF


From markw at illuminae.com  Wed Nov 29 17:41:40 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:41:40 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regarding	use of DNS
In-Reply-To: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
References: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>

On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote:

> to the main open-bio.org web/DNS/mailserver. We are talking about 256 
> + simultaneous TCP sessions heading our way from the same remote IP  
> address.

I guess the first question is "which IP address?" :-)


> (1) In the way that moby is architected is it expected that either  
> clients or servers would generate lots of DNS traffic for  
> biomoby.org? If what I am seeing is 'normal' then I just want to  
> leave things alone.

We run a cron'd script from our server here that tests all services in
the registry every hour.  I don't know for certain if this is using LSID
resolution as part of that task (Eddie, can you confirm?), but it
wouldn't surprise me if that were the case.  


> (2) How popular is LSID? Could services making use of the 'lsid' SVR  
> record be responsible for lots of DNS traffic? LIke 256+ sessions   
> from the same IP?


We are increasingly using the LSID to represent *all* entities in MOBY -
datatypes, service types, web service instances, etc.  A tool like
Taverna may well be resolving all LSIDs in the MOBY registry each time
it starts-up (?), which could account for the traffic.  Other client
applications will likely use LSID resolution in the same way in the near
future, if they don't already.  Again, the IP address would fairly
quickly tell us whether these are "scientists" or "scriptkiddies".  

Regardless, the use of LSIDs in MOBY is only going to increase over
time, so if it is becoming an issue now we should think about how to
manage it before it becomes a real problem...


> (3) I am going to reconfigure the DNS server so that we don't  
> recursively answer DNS requests for other domains (like 'cnn.com'  
> etc.) while still allowing anyone in the world to query the  
> biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
> person that I can remain in contact with while we do this work?

Eddie Kawas:  ed.kawas at gmail.com

I'm in the lab until tomorrow, and then away for about 10 days in
Germany, so he's the one who will answer your questions most rapidly.


>  I  
> want to make sure that we don't affect/break moby services while this  
> work is done.

:-)  thanks Chris!

Best wishes, 

M

-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From markw at illuminae.com  Wed Nov 29 17:41:40 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:41:40 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regarding	use of DNS
In-Reply-To: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
References: <C303F7E7-F6D2-4536-A08C-0E443D75405B@sonsorol.org>
Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>

On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote:

> to the main open-bio.org web/DNS/mailserver. We are talking about 256 
> + simultaneous TCP sessions heading our way from the same remote IP  
> address.

I guess the first question is "which IP address?" :-)


> (1) In the way that moby is architected is it expected that either  
> clients or servers would generate lots of DNS traffic for  
> biomoby.org? If what I am seeing is 'normal' then I just want to  
> leave things alone.

We run a cron'd script from our server here that tests all services in
the registry every hour.  I don't know for certain if this is using LSID
resolution as part of that task (Eddie, can you confirm?), but it
wouldn't surprise me if that were the case.  


> (2) How popular is LSID? Could services making use of the 'lsid' SVR  
> record be responsible for lots of DNS traffic? LIke 256+ sessions   
> from the same IP?


We are increasingly using the LSID to represent *all* entities in MOBY -
datatypes, service types, web service instances, etc.  A tool like
Taverna may well be resolving all LSIDs in the MOBY registry each time
it starts-up (?), which could account for the traffic.  Other client
applications will likely use LSID resolution in the same way in the near
future, if they don't already.  Again, the IP address would fairly
quickly tell us whether these are "scientists" or "scriptkiddies".  

Regardless, the use of LSIDs in MOBY is only going to increase over
time, so if it is becoming an issue now we should think about how to
manage it before it becomes a real problem...


> (3) I am going to reconfigure the DNS server so that we don't  
> recursively answer DNS requests for other domains (like 'cnn.com'  
> etc.) while still allowing anyone in the world to query the  
> biomoby.org DNS zone.  Can the moby developers/leaders elect a point  
> person that I can remain in contact with while we do this work?

Eddie Kawas:  ed.kawas at gmail.com

I'm in the lab until tomorrow, and then away for about 10 days in
Germany, so he's the one who will answer your questions most rapidly.


>  I  
> want to make sure that we don't affect/break moby services while this  
> work is done.

:-)  thanks Chris!

Best wishes, 

M

-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From markw at illuminae.com  Wed Nov 29 17:31:11 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 09:31:11 -0800
Subject: [MOBY-dev] [moby] RFC - Synchronization of Biomoby
	secondary	repositories
In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
Message-ID: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>

Hi Andreas!

Thanks for taking the time to put this document together.  Using the RSS
feed is an interesting idea.  My first instinct is that it might not be
"robust" enough, but I suppose if we spent more time thinking about what
information is passed on that RSS feed it might work quite well!

Have you considered taking advantage of the recent move towards
distributed service signatures?  The RDF Agent is capable of consuming a
list of URLs, recovering the RDF signatures from those URLs, and
rebuilding the entire registry from those RDF documents. It is also a
simple API call to MOBY Central that generates the list of URLs
representing all of the service signatures.  As such, a full mirroring
operation should require nothing more than a single call to the primary
MOBY Central, and passing the result of that call to the RDF agent of
the mirror site and letting it run... Eddie, correct me if that isn't
true...

I'm going to be at your institute this time next week, so let's talk
about it more in person :-)

Best wishes!

Mark


On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: 
> The following text describes the procedure of the synchronization of  
> Biomoby secondary repositories.
> 
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the global  
> set (less services) or added to the global set (more services)
> 
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests
> -populating a Moby central from scratch
> 
> Solutions:
> -The existing RSS feed is used to notify secondaries of changes  
> (register service/delete service/update service) to the master
> -A complete RSS document is created by a new dump method for  
> initialization of Moby centrals from scratch
> -Registrations are handled by the client and NOT cascaded
> 
> 1. Synchronizing repositories
> =============================
> 
> We propose that secondaries check the Biomoby RSS feed to be
> notified whether changes in the registration have been done.  
> Currently the RSS feed is updated once a day, for more rapid  
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of a  
> service/object. If changes were applied to the Biomoby Central
> registry the changes are adopted to the secondary. 
> The RSS contains the signature URL where the secondary picks up
> the service RDF to retrieve all details required for the
> registration using the existing RDF agent.
> 
> i) Problems/changes required:
> 
> The main question here is if unregistered services are deleted  
> completely from the central database or are marked as inactive. The
> problem about that is, that the feed would need to contain also the  
> information of a deleted service, so that the secondaries will  
> retrieve that information. So Moby central will have to keep a full  
> transaction log also of deletions.
> 
> 2. Filtering
> ============
> 
> We propose that any secondary can apply filters to the RSS feed and  
> thus only include a subset of all services/objects. This can be  
> useful to make finding services from lists easier, to tune workflows  
> to performant services, only use local services or to exclude test  
> services. Information relevant to filtering is in the RSS, like  
> authority, description, but maybe more will be relevant, then  
> filtering may need to happen at the level of service RDF.
> 
> 3. Private services
> ===================
> 
> We propose that any client can register services with a Moby central  
> secondary, these will then be available only to clients querying the  
> secondary. If the secondary is in a local network, this allows easy  
> access control to local services. Any secondary synchronizing to that  
> repository will of course inherit all those additional services,  
> allowing simple creation of local production Moby centrals and local  
> test Moby centrals.
> 
> 4. Registration
> ===============
> 
> We propose to NOT cascade registration requests, i.e. pass them on  
> from secondary to master. That means that the client has control over  
> where a registration is done but also means the client has to make  
> that choice. Registration clients must thus add an implementation  
> that allows a user to choose the Moby central where a service/object  
> should be registered. Registration always happens at the topmost Moby  
> central node where the service should be visible, all secondaries of  
> this Moby central will pick that service up by synchronization.
> 
> Why? Cascading registration is cumbersome, as only once a  
> registration request has reached the topmost node can name  
> duplications etc. be resolved, which must then be passed to the client.
> 
> Name conflicts can still occur with locally registered services.  
> E.g., Adam registers a private service AnalyseThis on a private  
> secondary. Later, Beth registers AnalyseThis with same authority on  
> the Moby central master. The private secondary picks this up from the  
> RSS and runs into a name duplication. Proposed solution: Local  
> registrations MUST ALWAYS use a local authority. E.g., Adam registers  
> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis  
> with authority paul_vitti.com. Then, we assume whoever registers a  
> service at a more global Moby central knows what we're doing and give  
> synchronization precedence over local registrations. E.g., a test  
> registry is a secondary of Moby central. Chris registers AnalyseThat  
> with authority paul_vitti.com in the test registry. Once he's happy  
> with testing, he registers AnalyseThat with authority paul_vitti.com  
> in Moby central. The test registry retrieves this from the RSS,  
> discards the local registration and overwrites it with the  
> registration picked up through the RSS.
> 
> 5. Moby central failure
> =======================
> 
> If a master Moby central fails, the secondaries continue normal  
> operation with no effect on service discovery for all clients keyed  
> to a secondary. However, registration is no longer possible at the  
> master node. Once the master node comes back up, all secondaries must  
> resync.
> 
> 6. Adaptations to the RSS
> =========================
> 
> For this procedure the current RSS feed has to be changed marginally, to
> enable on the one hand the correct notification of the secondaries,  
> on the other hand to ensure that the normal RSS reader still work the
> usual way. The current RSS feed mainly uses the Dublin Core Metadata 
> to provide the information, so to add additional information to the 
> feed it is only needed to add more Dublin Core Metadata.
> 
> Primarily the feed has to contain the information whether the service  
> is new, modified or deleted. Additionally the service rdf has to be 
> linked in the feed to enable the local RDF agent to apply the changes
> with the information of the service rdf to the local secondary. 
> If other additional information shall be added to the feed to provide 
> more possibilities to filter the services can be discussed.
> 
> 7. Resync
> =========
> 
> Another main aspect is the problem if a repository is out of sync  
> (e.g. due to a temporary failure of master or secondary). The RSS  
> feed has a limited length, which means a limited number of  
> transactions are contained. Possibly, this will mean it does not  
> contain all transactions since the last sync of a secondary.
> 
> 
> i) Solution
> We propose that each repository will store a time stamp of  
> the last synchronization. In case that
> in the next synchronization process the oldest changes in the feed  
> are older than the current sync time stamp of the repository, 
> we run the risk to not receive all information
> about service changes. In this case the secondary should be able to  
> ask the primary to create a RSS feed with all changes which have 
> happened since the current time stamp of the secondary.
> 
> 8. Initial load
> ===============
> 
> When populating a new secondary from scratch, all registered services/ 
> objects need to be received from the master Moby central. We propose  
> a new method in Moby central to request all registered services/ 
> objects as RSS. Then, the initialization proceeds exactly like a  
> synchronization.
> 
> 
> 
> So to kick off the discussion here are some of our questions:
> 
> 1.Is it reasonable to use the existing RSS feed for this procedure ?  
> It sounds very handy and avoids creating a similar complete new structure
> 
> 2.Does any structure keep track of deleted services ?
> 
> 3.Resync: Is it reasonable to timestamp all transactions in Moby  
> central? Or should we solve the resync issue by enforcing a full drop/ 
> emptying of the secondary and reload all data as in initial load?
> 
> 
> Thanks
> Heiko & Andreas
> 
> -- 
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research
> Carl-von-Linn?-Weg 10
> 50829 Cologne
> Germany
> E-mail:    groscurt at mpiz-koeln.mpg.de
> Phone:    +49(0)221-5062-447
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
-- 
Mark Wilkinson
Asst. Professor, Dept. of Medical Genetics
University of British Columbia
PI in Bioinformatics, iCAPTURE Centre
St. Paul's Hospital, Rm. 166, 1081 Burrard St.
Vancouver, BC, V6Z 1Y6
tel: 604 682 2344 x62129
fax: 604 806 9274

"Scientists would rather share their toothbrush than their data"
                                        - Carole Goble

                         ==================


***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee
and may contain information that is privileged and confidential.  Any
dissemination, distribution or copying of this communication by
unauthorized individuals is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
reply e-mail and delete the original and all copies from your system.


From ed.kawas at gmail.com  Wed Nov 29 18:03:48 2006
From: ed.kawas at gmail.com (Ed Kawas)
Date: Wed, 29 Nov 2006 10:03:48 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>
Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook>


> We run a cron'd script from our server here that tests all 
> services in the registry every hour.  I don't know for 
> certain if this is using LSID resolution as part of that task 
> (Eddie, can you confirm?), but it wouldn't surprise me if 
> that were the case.  
It does not. Pure api (findservice, etc).

> 
> 
> > (2) How popular is LSID? Could services making use of the 
> 'lsid' SVR  
> > record be responsible for lots of DNS traffic? LIke 256+ sessions   
> > from the same IP?
> 
> 
> We are increasingly using the LSID to represent *all* 
> entities in MOBY - datatypes, service types, web service 
> instances, etc.  A tool like Taverna may well be resolving 
> all LSIDs in the MOBY registry each time it starts-up (?), 
> which could account for the traffic.  Other client 
> applications will likely use LSID resolution in the same way 
> in the near future, if they don't already.  Again, the IP 
> address would fairly quickly tell us whether these are 
> "scientists" or "scriptkiddies".  
> 
> Regardless, the use of LSIDs in MOBY is only going to 
> increase over time, so if it is becoming an issue now we 
> should think about how to manage it before it becomes a real 
> problem...
> 
Mark, your gbrowse_moby application uses lsids a lot. However, those requests
would be from a single ip address.

Eddie


From markw at illuminae.com  Wed Nov 29 18:34:47 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 10:34:47 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook>
References: <002f01c713e0$b9e393d0$6900a8c0@notebook>
Message-ID: <op.tjsi79nqnbznux@hoegaarden.mrl.ubc.ca>

On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas <ed.kawas at gmail.com> wrote:

> Mark, your gbrowse_moby application uses lsids a lot. However, those  
> requests
> would be from a single ip address.

Right... but I don't think it creates 256+ requests at a time, since it is  
a low-throughput interface...  I'd be surprised if gbrowse moby was the  
culprit here.

M


From markw at illuminae.com  Wed Nov 29 18:34:47 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 10:34:47 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook>
References: <002f01c713e0$b9e393d0$6900a8c0@notebook>
Message-ID: <op.tjsi79nqnbznux@hoegaarden.mrl.ubc.ca>

On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas <ed.kawas at gmail.com> wrote:

> Mark, your gbrowse_moby application uses lsids a lot. However, those  
> requests
> would be from a single ip address.

Right... but I don't think it creates 256+ requests at a time, since it is  
a low-throughput interface...  I'd be surprised if gbrowse moby was the  
culprit here.

M


From ed.kawas at gmail.com  Wed Nov 29 18:03:48 2006
From: ed.kawas at gmail.com (Ed Kawas)
Date: Wed, 29 Nov 2006 10:03:48 -0800
Subject: [MOBY-dev] [moby] question for moby devs/architects
	regardinguse of DNS
In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca>
Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook>


> We run a cron'd script from our server here that tests all 
> services in the registry every hour.  I don't know for 
> certain if this is using LSID resolution as part of that task 
> (Eddie, can you confirm?), but it wouldn't surprise me if 
> that were the case.  
It does not. Pure api (findservice, etc).

> 
> 
> > (2) How popular is LSID? Could services making use of the 
> 'lsid' SVR  
> > record be responsible for lots of DNS traffic? LIke 256+ sessions   
> > from the same IP?
> 
> 
> We are increasingly using the LSID to represent *all* 
> entities in MOBY - datatypes, service types, web service 
> instances, etc.  A tool like Taverna may well be resolving 
> all LSIDs in the MOBY registry each time it starts-up (?), 
> which could account for the traffic.  Other client 
> applications will likely use LSID resolution in the same way 
> in the near future, if they don't already.  Again, the IP 
> address would fairly quickly tell us whether these are 
> "scientists" or "scriptkiddies".  
> 
> Regardless, the use of LSIDs in MOBY is only going to 
> increase over time, so if it is becoming an issue now we 
> should think about how to manage it before it becomes a real 
> problem...
> 
Mark, your gbrowse_moby application uses lsids a lot. However, those requests
would be from a single ip address.

Eddie


From edward.kawas at gmail.com  Wed Nov 29 15:22:14 2006
From: edward.kawas at gmail.com (Edward Kawas)
Date: Wed, 29 Nov 2006 07:22:14 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
Message-ID: <001f01c713ca$27188b70$6900a8c0@notebook>

Hi,

>From reading *just* the 'aim' and 'problems' portion of this message, I was
wondering whether you thought about using the agent for mirroring.

Just throwing it out there,

Eddie

> -----Original Message-----
> From: moby-dev-bounces at lists.open-bio.org 
> [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of 
> Andreas Groscurth
> Sent: Wednesday, November 29, 2006 4:02 AM
> To: moby-dev at lists.open-bio.org
> Subject: [MOBY-dev] RFC - Synchronization of Biomoby 
> secondary repositories
> 
> The following text describes the procedure of the 
> synchronization of Biomoby secondary repositories.
> 
> Aim: Replicate BioMoby central
> -to create mirrors
> -to have redundancy in case of failure
> -to create private sets of services, either filtered from the 
> global set (less services) or added to the global set (more services)
> 
> Problems:
> -synchronizing repositories
> -cascading service/object registration requests -populating a 
> Moby central from scratch
> 
> Solutions:
> -The existing RSS feed is used to notify secondaries of 
> changes (register service/delete service/update service) to 
> the master -A complete RSS document is created by a new dump 
> method for initialization of Moby centrals from scratch 
> -Registrations are handled by the client and NOT cascaded
> 
> 1. Synchronizing repositories
> =============================
> 
> We propose that secondaries check the Biomoby RSS feed to be 
> notified whether changes in the registration have been done.  
> Currently the RSS feed is updated once a day, for more rapid 
> synchronization this would have to be changed.
> The changes include registration, modification or deletion of 
> a service/object. If changes were applied to the Biomoby 
> Central registry the changes are adopted to the secondary. 
> The RSS contains the signature URL where the secondary picks 
> up the service RDF to retrieve all details required for the 
> registration using the existing RDF agent.
> 
> i) Problems/changes required:
> 
> The main question here is if unregistered services are 
> deleted completely from the central database or are marked as 
> inactive. The problem about that is, that the feed would need 
> to contain also the information of a deleted service, so that 
> the secondaries will retrieve that information. So Moby 
> central will have to keep a full transaction log also of deletions.
> 
> 2. Filtering
> ============
> 
> We propose that any secondary can apply filters to the RSS 
> feed and thus only include a subset of all services/objects. 
> This can be useful to make finding services from lists 
> easier, to tune workflows to performant services, only use 
> local services or to exclude test services. Information 
> relevant to filtering is in the RSS, like authority, 
> description, but maybe more will be relevant, then filtering 
> may need to happen at the level of service RDF.
> 
> 3. Private services
> ===================
> 
> We propose that any client can register services with a Moby 
> central secondary, these will then be available only to 
> clients querying the secondary. If the secondary is in a 
> local network, this allows easy access control to local 
> services. Any secondary synchronizing to that repository will 
> of course inherit all those additional services, allowing 
> simple creation of local production Moby centrals and local 
> test Moby centrals.
> 
> 4. Registration
> ===============
> 
> We propose to NOT cascade registration requests, i.e. pass 
> them on from secondary to master. That means that the client 
> has control over where a registration is done but also means 
> the client has to make that choice. Registration clients must 
> thus add an implementation that allows a user to choose the 
> Moby central where a service/object should be registered. 
> Registration always happens at the topmost Moby central node 
> where the service should be visible, all secondaries of this 
> Moby central will pick that service up by synchronization.
> 
> Why? Cascading registration is cumbersome, as only once a 
> registration request has reached the topmost node can name 
> duplications etc. be resolved, which must then be passed to 
> the client.
> 
> Name conflicts can still occur with locally registered services.  
> E.g., Adam registers a private service AnalyseThis on a 
> private secondary. Later, Beth registers AnalyseThis with 
> same authority on the Moby central master. The private 
> secondary picks this up from the RSS and runs into a name 
> duplication. Proposed solution: Local registrations MUST 
> ALWAYS use a local authority. E.g., Adam registers 
> AnalyseThis with authority InternalIP, and Beth registers 
> AnalyseThis with authority paul_vitti.com. Then, we assume 
> whoever registers a service at a more global Moby central 
> knows what we're doing and give synchronization precedence 
> over local registrations. E.g., a test registry is a 
> secondary of Moby central. Chris registers AnalyseThat with 
> authority paul_vitti.com in the test registry. Once he's 
> happy with testing, he registers AnalyseThat with authority 
> paul_vitti.com in Moby central. The test registry retrieves 
> this from the RSS, discards the local registration and 
> overwrites it with the registration picked up through the RSS.
> 
> 5. Moby central failure
> =======================
> 
> If a master Moby central fails, the secondaries continue 
> normal operation with no effect on service discovery for all 
> clients keyed to a secondary. However, registration is no 
> longer possible at the master node. Once the master node 
> comes back up, all secondaries must resync.
> 
> 6. Adaptations to the RSS
> =========================
> 
> For this procedure the current RSS feed has to be changed 
> marginally, to enable on the one hand the correct 
> notification of the secondaries, on the other hand to ensure 
> that the normal RSS reader still work the usual way. The 
> current RSS feed mainly uses the Dublin Core Metadata to 
> provide the information, so to add additional information to 
> the feed it is only needed to add more Dublin Core Metadata.
> 
> Primarily the feed has to contain the information whether the 
> service is new, modified or deleted. Additionally the service 
> rdf has to be linked in the feed to enable the local RDF 
> agent to apply the changes with the information of the 
> service rdf to the local secondary. 
> If other additional information shall be added to the feed to 
> provide more possibilities to filter the services can be discussed.
> 
> 7. Resync
> =========
> 
> Another main aspect is the problem if a repository is out of 
> sync (e.g. due to a temporary failure of master or 
> secondary). The RSS feed has a limited length, which means a 
> limited number of transactions are contained. Possibly, this 
> will mean it does not contain all transactions since the last 
> sync of a secondary.
> 
> 
> i) Solution
> We propose that each repository will store a time stamp of 
> the last synchronization. In case that in the next 
> synchronization process the oldest changes in the feed are 
> older than the current sync time stamp of the repository, we 
> run the risk to not receive all information about service 
> changes. In this case the secondary should be able to ask the 
> primary to create a RSS feed with all changes which have 
> happened since the current time stamp of the secondary.
> 
> 8. Initial load
> ===============
> 
> When populating a new secondary from scratch, all registered 
> services/ objects need to be received from the master Moby 
> central. We propose a new method in Moby central to request 
> all registered services/ objects as RSS. Then, the 
> initialization proceeds exactly like a synchronization.
> 
> 
> 
> So to kick off the discussion here are some of our questions:
> 
> 1.Is it reasonable to use the existing RSS feed for this procedure ?  
> It sounds very handy and avoids creating a similar complete 
> new structure
> 
> 2.Does any structure keep track of deleted services ?
> 
> 3.Resync: Is it reasonable to timestamp all transactions in 
> Moby central? Or should we solve the resync issue by 
> enforcing a full drop/ emptying of the secondary and reload 
> all data as in initial load?
> 
> 
> Thanks
> Heiko & Andreas
> 
> --
> Andreas Groscurth
> Diplom Bioinformatik - PhD Student
> Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10
> 50829 Cologne
> Germany
> E-mail: ? ?groscurt at mpiz-koeln.mpg.de
> Phone: ? ?+49(0)221-5062-447
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


From markw at illuminae.com  Wed Nov 29 23:52:23 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Wed, 29 Nov 2006 15:52:23 -0800
Subject: [MOBY-dev] Holy cow!  Lotsa hits!
Message-ID: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>

852,000 hits on MOBY Central in November.  That's a new record :-)

M


-- 
--
Mark Wilkinson
Assistant Professor, Dept. Medical Genetics
University of British Columbia
PI Bioinformatics
iCAPTURE Centre, St. Paul's Hospital


From schoof at mpiz-koeln.mpg.de  Thu Nov 30 09:39:46 2006
From: schoof at mpiz-koeln.mpg.de (Heiko Schoof)
Date: Thu, 30 Nov 2006 10:39:46 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
Message-ID: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>

Hi Mark, Eddie,

we already use the RDF agent, from the RSS we intend to pull mainly  
the signature URLs, then we propose to use the RDF agent to get all  
data.
---quote---
The RSS contains the signature URL where the secondary picks up
the service RDF to retrieve all details required for the
registration using the existing RDF agent.
---/quote---

The advantage of the RSS versus the API call to retrieve ALL  
signature URLs is:
-scalability: If there are 1000s of signature URLs...with the RSS, we  
only retrieve changes
-filtering: ability to filter already based on data in the RSS with  
no need to actually retrieve the service RDF; should improve  
filtering performance as it's one request instead of potentially  
hundreds plus the need to parse all those RDF.

However, for the initialization/from scratch, this method indeed  
makes most sense, we'll modify the RFC accordingly. Is there a  
Biomoby WIKI where we can post that?

Do you intend to come "work" at the MPIZ next week? If yes, when? I'm  
free Thursday afternoon and most of Friday.

Best, Heiko

On 29. Nov 2006, at 18:31 Uhr, Mark Wilkinson wrote:

> Hi Andreas!
>
> Thanks for taking the time to put this document together.  Using  
> the RSS
> feed is an interesting idea.  My first instinct is that it might  
> not be
> "robust" enough, but I suppose if we spent more time thinking about  
> what
> information is passed on that RSS feed it might work quite well!
>
> Have you considered taking advantage of the recent move towards
> distributed service signatures?  The RDF Agent is capable of  
> consuming a
> list of URLs, recovering the RDF signatures from those URLs, and
> rebuilding the entire registry from those RDF documents. It is also a
> simple API call to MOBY Central that generates the list of URLs
> representing all of the service signatures.  As such, a full mirroring
> operation should require nothing more than a single call to the  
> primary
> MOBY Central, and passing the result of that call to the RDF agent of
> the mirror site and letting it run... Eddie, correct me if that isn't
> true...
>
> I'm going to be at your institute this time next week, so let's talk
> about it more in person :-)
>
> Best wishes!
>
> Mark
>
>
>
> On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote:
>> The following text describes the procedure of the synchronization of
>> Biomoby secondary repositories.
>>
>> Aim: Replicate BioMoby central
>> -to create mirrors
>> -to have redundancy in case of failure
>> -to create private sets of services, either filtered from the global
>> set (less services) or added to the global set (more services)
>>
>> Problems:
>> -synchronizing repositories
>> -cascading service/object registration requests
>> -populating a Moby central from scratch
>>
>> Solutions:
>> -The existing RSS feed is used to notify secondaries of changes
>> (register service/delete service/update service) to the master
>> -A complete RSS document is created by a new dump method for
>> initialization of Moby centrals from scratch
>> -Registrations are handled by the client and NOT cascaded
>>
>> 1. Synchronizing repositories
>> =============================
>>
>> We propose that secondaries check the Biomoby RSS feed to be
>> notified whether changes in the registration have been done.
>> Currently the RSS feed is updated once a day, for more rapid
>> synchronization this would have to be changed.
>> The changes include registration, modification or deletion of a
>> service/object. If changes were applied to the Biomoby Central
>> registry the changes are adopted to the secondary.
>> The RSS contains the signature URL where the secondary picks up
>> the service RDF to retrieve all details required for the
>> registration using the existing RDF agent.
>>
>> i) Problems/changes required:
>>
>> The main question here is if unregistered services are deleted
>> completely from the central database or are marked as inactive. The
>> problem about that is, that the feed would need to contain also the
>> information of a deleted service, so that the secondaries will
>> retrieve that information. So Moby central will have to keep a full
>> transaction log also of deletions.
>>
>> 2. Filtering
>> ============
>>
>> We propose that any secondary can apply filters to the RSS feed and
>> thus only include a subset of all services/objects. This can be
>> useful to make finding services from lists easier, to tune workflows
>> to performant services, only use local services or to exclude test
>> services. Information relevant to filtering is in the RSS, like
>> authority, description, but maybe more will be relevant, then
>> filtering may need to happen at the level of service RDF.
>>
>> 3. Private services
>> ===================
>>
>> We propose that any client can register services with a Moby central
>> secondary, these will then be available only to clients querying the
>> secondary. If the secondary is in a local network, this allows easy
>> access control to local services. Any secondary synchronizing to that
>> repository will of course inherit all those additional services,
>> allowing simple creation of local production Moby centrals and local
>> test Moby centrals.
>>
>> 4. Registration
>> ===============
>>
>> We propose to NOT cascade registration requests, i.e. pass them on
>> from secondary to master. That means that the client has control over
>> where a registration is done but also means the client has to make
>> that choice. Registration clients must thus add an implementation
>> that allows a user to choose the Moby central where a service/object
>> should be registered. Registration always happens at the topmost Moby
>> central node where the service should be visible, all secondaries of
>> this Moby central will pick that service up by synchronization.
>>
>> Why? Cascading registration is cumbersome, as only once a
>> registration request has reached the topmost node can name
>> duplications etc. be resolved, which must then be passed to the  
>> client.
>>
>> Name conflicts can still occur with locally registered services.
>> E.g., Adam registers a private service AnalyseThis on a private
>> secondary. Later, Beth registers AnalyseThis with same authority on
>> the Moby central master. The private secondary picks this up from the
>> RSS and runs into a name duplication. Proposed solution: Local
>> registrations MUST ALWAYS use a local authority. E.g., Adam registers
>> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis
>> with authority paul_vitti.com. Then, we assume whoever registers a
>> service at a more global Moby central knows what we're doing and give
>> synchronization precedence over local registrations. E.g., a test
>> registry is a secondary of Moby central. Chris registers AnalyseThat
>> with authority paul_vitti.com in the test registry. Once he's happy
>> with testing, he registers AnalyseThat with authority paul_vitti.com
>> in Moby central. The test registry retrieves this from the RSS,
>> discards the local registration and overwrites it with the
>> registration picked up through the RSS.
>>
>> 5. Moby central failure
>> =======================
>>
>> If a master Moby central fails, the secondaries continue normal
>> operation with no effect on service discovery for all clients keyed
>> to a secondary. However, registration is no longer possible at the
>> master node. Once the master node comes back up, all secondaries must
>> resync.
>>
>> 6. Adaptations to the RSS
>> =========================
>>
>> For this procedure the current RSS feed has to be changed  
>> marginally, to
>> enable on the one hand the correct notification of the secondaries,
>> on the other hand to ensure that the normal RSS reader still work the
>> usual way. The current RSS feed mainly uses the Dublin Core Metadata
>> to provide the information, so to add additional information to the
>> feed it is only needed to add more Dublin Core Metadata.
>>
>> Primarily the feed has to contain the information whether the service
>> is new, modified or deleted. Additionally the service rdf has to be
>> linked in the feed to enable the local RDF agent to apply the changes
>> with the information of the service rdf to the local secondary.
>> If other additional information shall be added to the feed to provide
>> more possibilities to filter the services can be discussed.
>>
>> 7. Resync
>> =========
>>
>> Another main aspect is the problem if a repository is out of sync
>> (e.g. due to a temporary failure of master or secondary). The RSS
>> feed has a limited length, which means a limited number of
>> transactions are contained. Possibly, this will mean it does not
>> contain all transactions since the last sync of a secondary.
>>
>>
>> i) Solution
>> We propose that each repository will store a time stamp of
>> the last synchronization. In case that
>> in the next synchronization process the oldest changes in the feed
>> are older than the current sync time stamp of the repository,
>> we run the risk to not receive all information
>> about service changes. In this case the secondary should be able to
>> ask the primary to create a RSS feed with all changes which have
>> happened since the current time stamp of the secondary.
>>
>> 8. Initial load
>> ===============
>>
>> When populating a new secondary from scratch, all registered  
>> services/
>> objects need to be received from the master Moby central. We propose
>> a new method in Moby central to request all registered services/
>> objects as RSS. Then, the initialization proceeds exactly like a
>> synchronization.
>>
>>
>>
>> So to kick off the discussion here are some of our questions:
>>
>> 1.Is it reasonable to use the existing RSS feed for this procedure ?
>> It sounds very handy and avoids creating a similar complete new  
>> structure
>>
>> 2.Does any structure keep track of deleted services ?
>>
>> 3.Resync: Is it reasonable to timestamp all transactions in Moby
>> central? Or should we solve the resync issue by enforcing a full  
>> drop/
>> emptying of the secondary and reload all data as in initial load?
>>
>>
>> Thanks
>> Heiko & Andreas
>>
>> -- 
>> Andreas Groscurth
>> Diplom Bioinformatik - PhD Student
>> Max Planck Institute for Plant Breeding Research
>> Carl-von-Linn?-Weg 10
>> 50829 Cologne
>> Germany
>> E-mail:    groscurt at mpiz-koeln.mpg.de
>> Phone:    +49(0)221-5062-447
>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> -- 
> Mark Wilkinson
> Asst. Professor, Dept. of Medical Genetics
> University of British Columbia
> PI in Bioinformatics, iCAPTURE Centre
> St. Paul's Hospital, Rm. 166, 1081 Burrard St.
> Vancouver, BC, V6Z 1Y6
> tel: 604 682 2344 x62129
> fax: 604 806 9274
>
> "Scientists would rather share their toothbrush than their data"
>                                         - Carole Goble
>
>                          ==================
>
>
> ***CONFIDENTIALITY NOTICE***
> This electronic message is intended only for the use of the addressee
> and may contain information that is privileged and confidential.  Any
> dissemination, distribution or copying of this communication by
> unauthorized individuals is strictly prohibited. If you have received
> this communication in error, please notify the sender immediately by
> reply e-mail and delete the original and all copies from your system.
>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


From dgonzalez at cnio.es  Thu Nov 30 14:43:20 2006
From: dgonzalez at cnio.es (David G. Pisano)
Date: Thu, 30 Nov 2006 15:43:20 +0100
Subject: [MOBY-dev] Holy cow!  Lotsa hits!
In-Reply-To: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
Message-ID: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>

Mark,

Do you have historic records? It would be nice for presentations ;)
(and grats to everybody, by the way)

David

On  30 Nov, 2006, at 12:52 AM, Mark Wilkinson wrote:

> 852,000 hits on MOBY Central in November.  That's a new record :-)
>
> M
>
>
> -- 
> --
> Mark Wilkinson
> Assistant Professor, Dept. Medical Genetics
> University of British Columbia
> PI Bioinformatics
> iCAPTURE Centre, St. Paul's Hospital
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.


From markw at illuminae.com  Thu Nov 30 16:40:28 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 30 Nov 2006 08:40:28 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
Message-ID: <op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>


> we already use the RDF agent, from the RSS we intend to pull mainly the  
> signature URLs, then we propose to use the RDF agent to get all data.

Ah!  I see.


> The advantage of the RSS versus the API call to retrieve ALL signature  
> URLs is:
> -scalability: If there are 1000s of signature URLs...with the RSS, we  
> only retrieve changes

I would need to modify the RSS feed such that it reports additions *and*  
removals (right now it is just additions), and we would have to come up  
with a formal way of representing these...  and then the RSS functionality  
and features would need to become a formal part of the MOBY API (I can  
just hear Dr. Senger yelling at us right now that we're considering  
building core functionality on parts of MOBY that are entirely  
undocumented ;-)  ;-) ).  I guess that's why I am so hesitant to use RSS.

However, I guess so long as this is not a "recommended" practice, only a  
short-cut; and as long as it is *always* possible to use a true API call  
to mirror the registry, and we formally say what the recommended  
best-practice is, then it's reasonable to have a non-guaranteed  
alternative that is more lightweight.


> However, for the initialization/from scratch, this method indeed makes  
> most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI  
> where we can post that?

There currently is no Wiki running on BioMoby.  The last Wiki we had was  
hacked, and we just never brought it back up again.  We tried using  
Bugzilla as a way of tracking RFC's, but that didn't make many people very  
happy, so we're simply using the mailing list, with some formal write-up  
in an attachment.


> Do you intend to come "work" at the MPIZ next week?

Do I have to answer that publicly?  ;-)

The answer is "Yes".  We should get a variety of things - MOBY-wise and  
otherwise - sorted out between us while I am there.  I'm free all day  
Thursday and Friday, so that should give us plenty of time.

Cheers!

M

 
From schoof at mpiz-koeln.mpg.de  Thu Nov 30 18:01:48 2006
From: schoof at mpiz-koeln.mpg.de (Heiko Schoof)
Date: Thu, 30 Nov 2006 19:01:48 +0100
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
	<op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
Message-ID: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>


On 30. Nov 2006, at 17:40 Uhr, Mark Wilkinson wrote:

> and then the RSS functionality and features would need to become a  
> formal part of the MOBY API (I can just hear Dr. Senger yelling at  
> us right now that we're considering building core functionality on  
> parts of MOBY that are entirely undocumented ;-)  ;-) ).  I guess  
> that's why I am so hesitant to use RSS.
>
> However, I guess so long as this is not a "recommended" practice,  
> only a short-cut; and as long as it is *always* possible to use a  
> true API call to mirror the registry, and we formally say what the  
> recommended best-practice is, then it's reasonable to have a non- 
> guaranteed alternative that is more lightweight.
>
What we are proposing is to make new RSS functionality that will be  
part of the core API. It just so happens that RSS and the surrounding  
toolkit is well suited to the purpose, and more fitting to Moby than  
other solutions we've looked at. Why should we make a new API call  
that spews out some custom XML if we can perfectly use RSS within its  
specs and get a core RSS feed for "human"/aggregator consumption at  
the same time for free? We stated that we will need to modify the  
RSS, though not breaking anything as far as we can see.

We were not proposing to use RSS just because there's existing  
functionality ;-) we're not quite THAT lazy...though almost. And...  
isn't it *cool* to use RSS for some real work?

What is a true API call? Why is a call to the RSS feed not a true API  
call, if we make it part of the API? RSS is a tested, scaleable  
technology, which is why we propose to use it, as we envision  
hundreds of Moby clients maintaining their local cache (like  
Dashboard) through that functionality. Which is one thing we haven't  
mentioned yet, caching of Moby central for clients could easily build  
on the Moby secondary functionality, we think. But maybe Martin or  
others with experience on Moby central caching should comment on that.

Heiko


From markw at illuminae.com  Thu Nov 30 19:10:08 2006
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 30 Nov 2006 11:10:08 -0800
Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary
	repositories
In-Reply-To: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>
References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de>
	<1164821471.20382.2.camel@bioinfo.icapture.ubc.ca>
	<6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de>
	<op.tjt8lqcynbznux@hoegaarden.mrl.ubc.ca>
	<93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de>
Message-ID: <op.tjufi6hunbznux@hoegaarden.mrl.ubc.ca>

Hi Heiko,

> Why should we make a new API call that spews out some custom XML if we  
> can perfectly use RSS within its specs and get a core RSS feed for  
> "human"/aggregator consumption at the same time for free? We stated that  
> we will need to modify the RSS, though not breaking anything as far as  
> we can see.

I think we are effectively saying the same thing; I am not suggesting that  
we make a new API call, I'm suggesting that there are API call's that  
exist already that could be used for this purpose, albeit not so  
conveniently as your RSS suggestion.


> We were not proposing to use RSS just because there's existing  
> functionality ;-) we're not quite THAT lazy...though almost. And...  
> isn't it *cool* to use RSS for some real work?

Well... I guess this is the issue.  You're proposing to use RSS for a  
purpose for which it was not (IMO) designed.  As such, we would have to  
create new conventions around the RSS feed (hereafter called MOBY-RSS)  
that may or may not be more widely accepted in the world.  I agree 100%  
that it would be VERY cool to use RSS in this way, but v.v. a robust  
solution to the problem I'm not entirely convinced.  The amount of RSS-RDF  
we would have to maintain on MOBY Central in order to have a complete  
history that would allow a mirror to reliably re-construct the current  
state of the database is... well... large!  At the moment, I keep only the  
last... 100?... changes.  If you don't pick-up the feed for a day, or if  
someone registers 1000 new services, you wont see them in the feed.  To be  
safe, we would have to keep *all* changes in the RDF document at MOBY  
Central, in which case the overhead of calling the feed versus using the  
MOBY Central API would be about the same.

I'm not *opposed* to the idea of using RSS, and I agree that it is a novel  
and "cool" use for it, but I am concerned that we will perpetuate the MOBY  
reputation of making ad hoc decisions around other standards... (which  
isn't necessarily BAD, it just gives us a reputation for being maveriks,  
which angers the reviewers :-) )

Let's talk about it over a Koelsch (or two) next week!

M


-- 
--
Mark Wilkinson
Assistant Professor, Dept. Medical Genetics
University of British Columbia
PI Bioinformatics
iCAPTURE Centre, St. Paul's Hospital

***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee and  
may contain information that is privileged and confidential.  Any  
dissemination, distribution or copying of this communication by  
unauthorized individuals is strictly prohibited. If you have received this  
communication in error, please notify the sender immediately by reply  
e-mail and delete the original and all copies from your system.
 

From martin.senger at gmail.com  Thu Nov 30 15:08:37 2006
From: martin.senger at gmail.com (Martin Senger)
Date: Thu, 30 Nov 2006 15:08:37 +0000
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
Message-ID: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>

Well, let me be more realistic. I do not like so called "Potemkin's"
villages (see google if it does not ring any bell). Of course, I am glad
that BioMoby is growing, and I will support it in any possible way. And I
also understand that some facts are good for PR and for funding agencies.

But... For us, we should be more precise what these hits actually mean. For
example, I assume (please correct me if I am wrong) that every time somebody
updates her local cash from Dashboard, it increases the hit numbers. Also,
all these automated tools may influence how many times the registry is
accessed. Which gives us a distorted picture.

Better, possibly, would be to agree to an HTTP agent name (or names) that we
can use in this automatic tools - and to separate in the statistics *all*
hits (good for funding agencies) from the other hits where we do not include
requests from this (or these) HTTP agent(s).

Just my "c's,
Martin

-- 
Martin Senger
   email: martin.senger at gmail.com
   skype: martinsenger


From jmfernandez at cnio.es  Thu Nov 30 21:06:53 2006
From: jmfernandez at cnio.es (=?ISO-8859-1?Q?Jos=E9_Mar=EDa_Fern=E1ndez_Gonz=E1lez?=)
Date: Thu, 30 Nov 2006 22:06:53 +0100
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
	<4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
Message-ID: <456F47ED.1060800@cnio.es>

You should have some way to distinguish automated tools entries from the other 
(with some help from the tool developers, of course). As every request is done 
through HTTP, you could use User Agent signature recorded in Apache logs. For 
instance, each tool could use a different 'User Agent' variant, so they could 
be distinguished, or if a program/tool is going can issue requests related to 
maintenance, it should be advisable to alter its 'User Agent' signature in 
some way based on their mode.

	Just my 2 euro-cents.
		Jos? Mar?a

Martin Senger wrote:
> Well, let me be more realistic. I do not like so called "Potemkin's"
> villages (see google if it does not ring any bell). Of course, I am glad
> that BioMoby is growing, and I will support it in any possible way. And I
> also understand that some facts are good for PR and for funding agencies.
> 
> But... For us, we should be more precise what these hits actually mean. For
> example, I assume (please correct me if I am wrong) that every time somebody
> updates her local cash from Dashboard, it increases the hit numbers. Also,
> all these automated tools may influence how many times the registry is
> accessed. Which gives us a distorted picture.
> 
> Better, possibly, would be to agree to an HTTP agent name (or names) that we
> can use in this automatic tools - and to separate in the statistics *all*
> hits (good for funding agencies) from the other hits where we do not include
> requests from this (or these) HTTP agent(s).
> 
> Just my "c's,
> Martin
> 

-- 
Jos? Mar?a Fern?ndez Gonz?lez
Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 2256)
e-mail: jmfernandez at cnio.es		Fax: (+34) 91 224 69 76
Biolog?a Estructural y Bioinform?tica	Structural Biology and Bioinformatics
Centro Nacional de Investigaciones Oncol?gicas
C.P.: 28029				Zip Code: 28029
C/. Melchor Fern?ndez Almagro, 3	Madrid (Spain)

**NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.


From martin.senger at gmail.com  Thu Nov 30 21:48:08 2006
From: martin.senger at gmail.com (Martin Senger)
Date: Thu, 30 Nov 2006 21:48:08 +0000
Subject: [MOBY-dev] Holy cow! Lotsa hits!
In-Reply-To: <456F47ED.1060800@cnio.es>
References: <op.tjsxxlxznbznux@hoegaarden.mrl.ubc.ca>
	<9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es>
	<4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com>
	<456F47ED.1060800@cnio.es>
Message-ID: <4d93f07c0611301348v1289a6a3u13139a49a6d55678@mail.gmail.com>

> you could use User Agent signature


Well, that's what I said :-)
Martin


-- 
Martin Senger
   email: martin.senger at gmail.com
   skype: martinsenger