From Pieter.Neerincx at wur.nl Wed Nov 8 05:56:05 2006 From: Pieter.Neerincx at wur.nl (Pieter Neerincx) Date: Wed, 8 Nov 2006 11:56:05 +0100 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <44E387CB.2080905@ucalgary.ca> References: <4d93f07c0608160850w68eeb88l185365d679c2edbe@mail.gmail.com> <44E37AC5.8080105@ucalgary.ca> <1155760984.6594.23.camel@bioinfo.icapture.ubc.ca> <44E387CB.2080905@ucalgary.ca> Message-ID: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl> Hi all, I'm having a problem with the BioMOBY ping thing. As far as I know the services I had registered in the central BioMOBY Central respond correctly to a BioMOBY ping request. They are listed as dead though on the BioMOBY website and I'm wondering why. The current suspects are: * Base64 encoded output. Does the agent decode base64 content correctly? * HTTPS. My services require an https connection. If the agent is using Perl code it will probably complain about not being able to validate the certificate, but execute anyway. If the agent was written in Java it will refuse to execute the service if the SSL certificates can not be validated. Our certificates are self-signed, so you'd have to add them to your keystore to be able to execute our services with a Java client. My services might need an update to take advantage of LSID resolution and the asynchronous one needs to be rewritten for our new BioMOBY async services standard, but they are not dead! Something else: I plan on resuming my SOAP::Lite testing with the latest and greatest version. Is there anybody out there who is currently successfully running a (patched) S::L version > 0.60? Cheers, Pi From edward.kawas at gmail.com Wed Nov 8 09:22:42 2006 From: edward.kawas at gmail.com (Edward Kawas) Date: Wed, 8 Nov 2006 06:22:42 -0800 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl> Message-ID: <003601c70341$5b74fe60$6d00a8c0@notebook> Hi Pieter, > I'm having a problem with the BioMOBY ping thing. As far as I > know the services I had registered in the central BioMOBY > Central respond correctly to a BioMOBY ping request. They are > listed as dead though on the BioMOBY website and I'm > wondering why. The current suspects are: > > * Base64 encoded output. Does the agent decode base64 content > correctly? > * HTTPS. My services require an https connection. If the > agent is using Perl code it will probably complain about not > being able to validate the certificate, but execute anyway. > If the agent was written in Java it will refuse to execute > the service if the SSL certificates can not be validated. Our > certificates are self-signed, so you'd have to add them to > your keystore to be able to execute our services with a Java client. > I bet that its listed as dead because of authentication. What can I do to get around this? Thanks, Eddie From gordonp at ucalgary.ca Wed Nov 8 09:56:02 2006 From: gordonp at ucalgary.ca (Paul Gordon) Date: Wed, 08 Nov 2006 07:56:02 -0700 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <003601c70341$5b74fe60$6d00a8c0@notebook> References: <003601c70341$5b74fe60$6d00a8c0@notebook> Message-ID: <4551F002.2040909@ucalgary.ca> It;'s not an immediate solution, but I would suggest to that people providing public SSL services get a real certificate if they can rustle up the money (only about $100/year). I know most MOBY clients won't connect to unauthenticated services either, so making it really signed by an authority will make it so much more useful... > Hi Pieter, > > >> I'm having a problem with the BioMOBY ping thing. As far as I >> know the services I had registered in the central BioMOBY >> Central respond correctly to a BioMOBY ping request. They are >> listed as dead though on the BioMOBY website and I'm >> wondering why. The current suspects are: >> >> * Base64 encoded output. Does the agent decode base64 content >> correctly? >> * HTTPS. My services require an https connection. If the >> agent is using Perl code it will probably complain about not >> being able to validate the certificate, but execute anyway. >> If the agent was written in Java it will refuse to execute >> the service if the SSL certificates can not be validated. Our >> certificates are self-signed, so you'd have to add them to >> your keystore to be able to execute our services with a Java client. >> >> > I bet that its listed as dead because of authentication. What can I do to get > around this? > > Thanks, > > Eddie > > > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev > > > From Pieter.Neerincx at wur.nl Fri Nov 10 05:59:57 2006 From: Pieter.Neerincx at wur.nl (Pieter Neerincx) Date: Fri, 10 Nov 2006 11:59:57 +0100 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <4551F002.2040909@ucalgary.ca> References: <003601c70341$5b74fe60$6d00a8c0@notebook> <4551F002.2040909@ucalgary.ca> Message-ID: Hi Eddie and Paul, On 8-Nov-2006, at 3:56 PM, Paul Gordon wrote: > It;'s not an immediate solution, but I would suggest to that people > providing public SSL services get a real certificate if they can > rustle > up the money (only about $100/year). I know most MOBY clients won't > connect to unauthenticated services either, so making it really signed > by an authority will make it so much more useful... Ok, I know that a certificate signed by one of the "big" certificate authorities would make life a little easier, but our self-signed certificates are just as real and valid :). The problem is the distribution of the certificates. I would have to drop by at your office in person with my passport to prove I am who I claim to be and the certificate on for example a USB-stick. If I would send you the certificate in a plain e-mail, you can not verify whether it's really my certificate or a fake one. Anyway, that distribution problem can also be solved without $100. I'll add some documentation to the site for people who want to use HTTPS for their services and/or BioMOBY Central... Cheers, Pi >> Hi Pieter, >> >> >>> I'm having a problem with the BioMOBY ping thing. As far as I >>> know the services I had registered in the central BioMOBY >>> Central respond correctly to a BioMOBY ping request. They are >>> listed as dead though on the BioMOBY website and I'm >>> wondering why. The current suspects are: >>> >>> * Base64 encoded output. Does the agent decode base64 content >>> correctly? >>> * HTTPS. My services require an https connection. If the >>> agent is using Perl code it will probably complain about not >>> being able to validate the certificate, but execute anyway. >>> If the agent was written in Java it will refuse to execute >>> the service if the SSL certificates can not be validated. Our >>> certificates are self-signed, so you'd have to add them to >>> your keystore to be able to execute our services with a Java client. >>> >>> >> I bet that its listed as dead because of authentication. What can >> I do to get >> around this? >> >> Thanks, >> >> Eddie >> >> >> >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev >> >> >> > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev Wageningen University and Research centre (WUR) Laboratory of Bioinformatics Transitorium (building 312) room 1034 Dreijenlaan 3 6703 HA Wageningen The Netherlands phone: 0317-483 060 fax: 0317-483 584 mobile: 06-143 66 783 pieter.neerincx at wur.nl From groscurt at mpiz-koeln.mpg.de Wed Nov 29 07:02:05 2006 From: groscurt at mpiz-koeln.mpg.de (Andreas Groscurth) Date: Wed, 29 Nov 2006 13:02:05 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories Message-ID: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> The following text describes the procedure of the synchronization of Biomoby secondary repositories. Aim: Replicate BioMoby central -to create mirrors -to have redundancy in case of failure -to create private sets of services, either filtered from the global set (less services) or added to the global set (more services) Problems: -synchronizing repositories -cascading service/object registration requests -populating a Moby central from scratch Solutions: -The existing RSS feed is used to notify secondaries of changes (register service/delete service/update service) to the master -A complete RSS document is created by a new dump method for initialization of Moby centrals from scratch -Registrations are handled by the client and NOT cascaded 1. Synchronizing repositories ============================= We propose that secondaries check the Biomoby RSS feed to be notified whether changes in the registration have been done. Currently the RSS feed is updated once a day, for more rapid synchronization this would have to be changed. The changes include registration, modification or deletion of a service/object. If changes were applied to the Biomoby Central registry the changes are adopted to the secondary. The RSS contains the signature URL where the secondary picks up the service RDF to retrieve all details required for the registration using the existing RDF agent. i) Problems/changes required: The main question here is if unregistered services are deleted completely from the central database or are marked as inactive. The problem about that is, that the feed would need to contain also the information of a deleted service, so that the secondaries will retrieve that information. So Moby central will have to keep a full transaction log also of deletions. 2. Filtering ============ We propose that any secondary can apply filters to the RSS feed and thus only include a subset of all services/objects. This can be useful to make finding services from lists easier, to tune workflows to performant services, only use local services or to exclude test services. Information relevant to filtering is in the RSS, like authority, description, but maybe more will be relevant, then filtering may need to happen at the level of service RDF. 3. Private services =================== We propose that any client can register services with a Moby central secondary, these will then be available only to clients querying the secondary. If the secondary is in a local network, this allows easy access control to local services. Any secondary synchronizing to that repository will of course inherit all those additional services, allowing simple creation of local production Moby centrals and local test Moby centrals. 4. Registration =============== We propose to NOT cascade registration requests, i.e. pass them on from secondary to master. That means that the client has control over where a registration is done but also means the client has to make that choice. Registration clients must thus add an implementation that allows a user to choose the Moby central where a service/object should be registered. Registration always happens at the topmost Moby central node where the service should be visible, all secondaries of this Moby central will pick that service up by synchronization. Why? Cascading registration is cumbersome, as only once a registration request has reached the topmost node can name duplications etc. be resolved, which must then be passed to the client. Name conflicts can still occur with locally registered services. E.g., Adam registers a private service AnalyseThis on a private secondary. Later, Beth registers AnalyseThis with same authority on the Moby central master. The private secondary picks this up from the RSS and runs into a name duplication. Proposed solution: Local registrations MUST ALWAYS use a local authority. E.g., Adam registers AnalyseThis with authority InternalIP, and Beth registers AnalyseThis with authority paul_vitti.com. Then, we assume whoever registers a service at a more global Moby central knows what we're doing and give synchronization precedence over local registrations. E.g., a test registry is a secondary of Moby central. Chris registers AnalyseThat with authority paul_vitti.com in the test registry. Once he's happy with testing, he registers AnalyseThat with authority paul_vitti.com in Moby central. The test registry retrieves this from the RSS, discards the local registration and overwrites it with the registration picked up through the RSS. 5. Moby central failure ======================= If a master Moby central fails, the secondaries continue normal operation with no effect on service discovery for all clients keyed to a secondary. However, registration is no longer possible at the master node. Once the master node comes back up, all secondaries must resync. 6. Adaptations to the RSS ========================= For this procedure the current RSS feed has to be changed marginally, to enable on the one hand the correct notification of the secondaries, on the other hand to ensure that the normal RSS reader still work the usual way. The current RSS feed mainly uses the Dublin Core Metadata to provide the information, so to add additional information to the feed it is only needed to add more Dublin Core Metadata. Primarily the feed has to contain the information whether the service is new, modified or deleted. Additionally the service rdf has to be linked in the feed to enable the local RDF agent to apply the changes with the information of the service rdf to the local secondary. If other additional information shall be added to the feed to provide more possibilities to filter the services can be discussed. 7. Resync ========= Another main aspect is the problem if a repository is out of sync (e.g. due to a temporary failure of master or secondary). The RSS feed has a limited length, which means a limited number of transactions are contained. Possibly, this will mean it does not contain all transactions since the last sync of a secondary. i) Solution We propose that each repository will store a time stamp of the last synchronization. In case that in the next synchronization process the oldest changes in the feed are older than the current sync time stamp of the repository, we run the risk to not receive all information about service changes. In this case the secondary should be able to ask the primary to create a RSS feed with all changes which have happened since the current time stamp of the secondary. 8. Initial load =============== When populating a new secondary from scratch, all registered services/ objects need to be received from the master Moby central. We propose a new method in Moby central to request all registered services/ objects as RSS. Then, the initialization proceeds exactly like a synchronization. So to kick off the discussion here are some of our questions: 1.Is it reasonable to use the existing RSS feed for this procedure ? It sounds very handy and avoids creating a similar complete new structure 2.Does any structure keep track of deleted services ? 3.Resync: Is it reasonable to timestamp all transactions in Moby central? Or should we solve the resync issue by enforcing a full drop/ emptying of the secondary and reload all data as in initial load? Thanks Heiko & Andreas -- Andreas Groscurth Diplom Bioinformatik - PhD Student Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10 50829 Cologne Germany E-mail: ? ?groscurt at mpiz-koeln.mpg.de Phone: ? ?+49(0)221-5062-447 From dag at sonsorol.org Wed Nov 29 09:20:46 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 29 Nov 2006 09:20:46 -0500 Subject: [MOBY-dev] question for moby devs/architects regarding use of DNS Message-ID: Hi folks, I just installed a new firewall (or in fancy terms 'unified threat management appliance' ) upstream of the main open-bio.org servers. One of the more interesting reports so far is that a number of IP addresses have been opening up very large numbers of TCP connections to the main open-bio.org web/DNS/mailserver. We are talking about 256 + simultaneous TCP sessions heading our way from the same remote IP address. Some of this is just web spidering and FTP mirroring but quite a bit of the traffic (oddly enough) is DNS related. We have an open DNS server and it is quite likely that people have found this out and are using us for recursive DNS queries. It is actually pretty easy to constrain/lock this down but that DNS server is also the primary nameserver for biomoby.org and the very special LSID SVR identifier used for LSID discovery operations. I guess I have the following questions/requests for the moby expert community: (1) In the way that moby is architected is it expected that either clients or servers would generate lots of DNS traffic for biomoby.org? If what I am seeing is 'normal' then I just want to leave things alone. (2) How popular is LSID? Could services making use of the 'lsid' SVR record be responsible for lots of DNS traffic? LIke 256+ sessions from the same IP? (3) I am going to reconfigure the DNS server so that we don't recursively answer DNS requests for other domains (like 'cnn.com' etc.) while still allowing anyone in the world to query the biomoby.org DNS zone. Can the moby developers/leaders elect a point person that I can remain in contact with while we do this work? I want to make sure that we don't affect/break moby services while this work is done. Thanks! -Chris OBF From markw at illuminae.com Wed Nov 29 12:41:40 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:41:40 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regarding use of DNS In-Reply-To: References: Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote: > to the main open-bio.org web/DNS/mailserver. We are talking about 256 > + simultaneous TCP sessions heading our way from the same remote IP > address. I guess the first question is "which IP address?" :-) > (1) In the way that moby is architected is it expected that either > clients or servers would generate lots of DNS traffic for > biomoby.org? If what I am seeing is 'normal' then I just want to > leave things alone. We run a cron'd script from our server here that tests all services in the registry every hour. I don't know for certain if this is using LSID resolution as part of that task (Eddie, can you confirm?), but it wouldn't surprise me if that were the case. > (2) How popular is LSID? Could services making use of the 'lsid' SVR > record be responsible for lots of DNS traffic? LIke 256+ sessions > from the same IP? We are increasingly using the LSID to represent *all* entities in MOBY - datatypes, service types, web service instances, etc. A tool like Taverna may well be resolving all LSIDs in the MOBY registry each time it starts-up (?), which could account for the traffic. Other client applications will likely use LSID resolution in the same way in the near future, if they don't already. Again, the IP address would fairly quickly tell us whether these are "scientists" or "scriptkiddies". Regardless, the use of LSIDs in MOBY is only going to increase over time, so if it is becoming an issue now we should think about how to manage it before it becomes a real problem... > (3) I am going to reconfigure the DNS server so that we don't > recursively answer DNS requests for other domains (like 'cnn.com' > etc.) while still allowing anyone in the world to query the > biomoby.org DNS zone. Can the moby developers/leaders elect a point > person that I can remain in contact with while we do this work? Eddie Kawas: ed.kawas at gmail.com I'm in the lab until tomorrow, and then away for about 10 days in Germany, so he's the one who will answer your questions most rapidly. > I > want to make sure that we don't affect/break moby services while this > work is done. :-) thanks Chris! Best wishes, M -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From markw at illuminae.com Wed Nov 29 12:41:40 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:41:40 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regarding use of DNS In-Reply-To: References: Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote: > to the main open-bio.org web/DNS/mailserver. We are talking about 256 > + simultaneous TCP sessions heading our way from the same remote IP > address. I guess the first question is "which IP address?" :-) > (1) In the way that moby is architected is it expected that either > clients or servers would generate lots of DNS traffic for > biomoby.org? If what I am seeing is 'normal' then I just want to > leave things alone. We run a cron'd script from our server here that tests all services in the registry every hour. I don't know for certain if this is using LSID resolution as part of that task (Eddie, can you confirm?), but it wouldn't surprise me if that were the case. > (2) How popular is LSID? Could services making use of the 'lsid' SVR > record be responsible for lots of DNS traffic? LIke 256+ sessions > from the same IP? We are increasingly using the LSID to represent *all* entities in MOBY - datatypes, service types, web service instances, etc. A tool like Taverna may well be resolving all LSIDs in the MOBY registry each time it starts-up (?), which could account for the traffic. Other client applications will likely use LSID resolution in the same way in the near future, if they don't already. Again, the IP address would fairly quickly tell us whether these are "scientists" or "scriptkiddies". Regardless, the use of LSIDs in MOBY is only going to increase over time, so if it is becoming an issue now we should think about how to manage it before it becomes a real problem... > (3) I am going to reconfigure the DNS server so that we don't > recursively answer DNS requests for other domains (like 'cnn.com' > etc.) while still allowing anyone in the world to query the > biomoby.org DNS zone. Can the moby developers/leaders elect a point > person that I can remain in contact with while we do this work? Eddie Kawas: ed.kawas at gmail.com I'm in the lab until tomorrow, and then away for about 10 days in Germany, so he's the one who will answer your questions most rapidly. > I > want to make sure that we don't affect/break moby services while this > work is done. :-) thanks Chris! Best wishes, M -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From markw at illuminae.com Wed Nov 29 12:31:11 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:31:11 -0800 Subject: [MOBY-dev] [moby] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> Message-ID: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> Hi Andreas! Thanks for taking the time to put this document together. Using the RSS feed is an interesting idea. My first instinct is that it might not be "robust" enough, but I suppose if we spent more time thinking about what information is passed on that RSS feed it might work quite well! Have you considered taking advantage of the recent move towards distributed service signatures? The RDF Agent is capable of consuming a list of URLs, recovering the RDF signatures from those URLs, and rebuilding the entire registry from those RDF documents. It is also a simple API call to MOBY Central that generates the list of URLs representing all of the service signatures. As such, a full mirroring operation should require nothing more than a single call to the primary MOBY Central, and passing the result of that call to the RDF agent of the mirror site and letting it run... Eddie, correct me if that isn't true... I'm going to be at your institute this time next week, so let's talk about it more in person :-) Best wishes! Mark On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: > The following text describes the procedure of the synchronization of > Biomoby secondary repositories. > > Aim: Replicate BioMoby central > -to create mirrors > -to have redundancy in case of failure > -to create private sets of services, either filtered from the global > set (less services) or added to the global set (more services) > > Problems: > -synchronizing repositories > -cascading service/object registration requests > -populating a Moby central from scratch > > Solutions: > -The existing RSS feed is used to notify secondaries of changes > (register service/delete service/update service) to the master > -A complete RSS document is created by a new dump method for > initialization of Moby centrals from scratch > -Registrations are handled by the client and NOT cascaded > > 1. Synchronizing repositories > ============================= > > We propose that secondaries check the Biomoby RSS feed to be > notified whether changes in the registration have been done. > Currently the RSS feed is updated once a day, for more rapid > synchronization this would have to be changed. > The changes include registration, modification or deletion of a > service/object. If changes were applied to the Biomoby Central > registry the changes are adopted to the secondary. > The RSS contains the signature URL where the secondary picks up > the service RDF to retrieve all details required for the > registration using the existing RDF agent. > > i) Problems/changes required: > > The main question here is if unregistered services are deleted > completely from the central database or are marked as inactive. The > problem about that is, that the feed would need to contain also the > information of a deleted service, so that the secondaries will > retrieve that information. So Moby central will have to keep a full > transaction log also of deletions. > > 2. Filtering > ============ > > We propose that any secondary can apply filters to the RSS feed and > thus only include a subset of all services/objects. This can be > useful to make finding services from lists easier, to tune workflows > to performant services, only use local services or to exclude test > services. Information relevant to filtering is in the RSS, like > authority, description, but maybe more will be relevant, then > filtering may need to happen at the level of service RDF. > > 3. Private services > =================== > > We propose that any client can register services with a Moby central > secondary, these will then be available only to clients querying the > secondary. If the secondary is in a local network, this allows easy > access control to local services. Any secondary synchronizing to that > repository will of course inherit all those additional services, > allowing simple creation of local production Moby centrals and local > test Moby centrals. > > 4. Registration > =============== > > We propose to NOT cascade registration requests, i.e. pass them on > from secondary to master. That means that the client has control over > where a registration is done but also means the client has to make > that choice. Registration clients must thus add an implementation > that allows a user to choose the Moby central where a service/object > should be registered. Registration always happens at the topmost Moby > central node where the service should be visible, all secondaries of > this Moby central will pick that service up by synchronization. > > Why? Cascading registration is cumbersome, as only once a > registration request has reached the topmost node can name > duplications etc. be resolved, which must then be passed to the client. > > Name conflicts can still occur with locally registered services. > E.g., Adam registers a private service AnalyseThis on a private > secondary. Later, Beth registers AnalyseThis with same authority on > the Moby central master. The private secondary picks this up from the > RSS and runs into a name duplication. Proposed solution: Local > registrations MUST ALWAYS use a local authority. E.g., Adam registers > AnalyseThis with authority InternalIP, and Beth registers AnalyseThis > with authority paul_vitti.com. Then, we assume whoever registers a > service at a more global Moby central knows what we're doing and give > synchronization precedence over local registrations. E.g., a test > registry is a secondary of Moby central. Chris registers AnalyseThat > with authority paul_vitti.com in the test registry. Once he's happy > with testing, he registers AnalyseThat with authority paul_vitti.com > in Moby central. The test registry retrieves this from the RSS, > discards the local registration and overwrites it with the > registration picked up through the RSS. > > 5. Moby central failure > ======================= > > If a master Moby central fails, the secondaries continue normal > operation with no effect on service discovery for all clients keyed > to a secondary. However, registration is no longer possible at the > master node. Once the master node comes back up, all secondaries must > resync. > > 6. Adaptations to the RSS > ========================= > > For this procedure the current RSS feed has to be changed marginally, to > enable on the one hand the correct notification of the secondaries, > on the other hand to ensure that the normal RSS reader still work the > usual way. The current RSS feed mainly uses the Dublin Core Metadata > to provide the information, so to add additional information to the > feed it is only needed to add more Dublin Core Metadata. > > Primarily the feed has to contain the information whether the service > is new, modified or deleted. Additionally the service rdf has to be > linked in the feed to enable the local RDF agent to apply the changes > with the information of the service rdf to the local secondary. > If other additional information shall be added to the feed to provide > more possibilities to filter the services can be discussed. > > 7. Resync > ========= > > Another main aspect is the problem if a repository is out of sync > (e.g. due to a temporary failure of master or secondary). The RSS > feed has a limited length, which means a limited number of > transactions are contained. Possibly, this will mean it does not > contain all transactions since the last sync of a secondary. > > > i) Solution > We propose that each repository will store a time stamp of > the last synchronization. In case that > in the next synchronization process the oldest changes in the feed > are older than the current sync time stamp of the repository, > we run the risk to not receive all information > about service changes. In this case the secondary should be able to > ask the primary to create a RSS feed with all changes which have > happened since the current time stamp of the secondary. > > 8. Initial load > =============== > > When populating a new secondary from scratch, all registered services/ > objects need to be received from the master Moby central. We propose > a new method in Moby central to request all registered services/ > objects as RSS. Then, the initialization proceeds exactly like a > synchronization. > > > > So to kick off the discussion here are some of our questions: > > 1.Is it reasonable to use the existing RSS feed for this procedure ? > It sounds very handy and avoids creating a similar complete new structure > > 2.Does any structure keep track of deleted services ? > > 3.Resync: Is it reasonable to timestamp all transactions in Moby > central? Or should we solve the resync issue by enforcing a full drop/ > emptying of the secondary and reload all data as in initial load? > > > Thanks > Heiko & Andreas > > -- > Andreas Groscurth > Diplom Bioinformatik - PhD Student > Max Planck Institute for Plant Breeding Research > Carl-von-Linn?-Weg 10 > 50829 Cologne > Germany > E-mail: groscurt at mpiz-koeln.mpg.de > Phone: +49(0)221-5062-447 > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From ed.kawas at gmail.com Wed Nov 29 13:03:48 2006 From: ed.kawas at gmail.com (Ed Kawas) Date: Wed, 29 Nov 2006 10:03:48 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook> > We run a cron'd script from our server here that tests all > services in the registry every hour. I don't know for > certain if this is using LSID resolution as part of that task > (Eddie, can you confirm?), but it wouldn't surprise me if > that were the case. It does not. Pure api (findservice, etc). > > > > (2) How popular is LSID? Could services making use of the > 'lsid' SVR > > record be responsible for lots of DNS traffic? LIke 256+ sessions > > from the same IP? > > > We are increasingly using the LSID to represent *all* > entities in MOBY - datatypes, service types, web service > instances, etc. A tool like Taverna may well be resolving > all LSIDs in the MOBY registry each time it starts-up (?), > which could account for the traffic. Other client > applications will likely use LSID resolution in the same way > in the near future, if they don't already. Again, the IP > address would fairly quickly tell us whether these are > "scientists" or "scriptkiddies". > > Regardless, the use of LSIDs in MOBY is only going to > increase over time, so if it is becoming an issue now we > should think about how to manage it before it becomes a real > problem... > Mark, your gbrowse_moby application uses lsids a lot. However, those requests would be from a single ip address. Eddie From markw at illuminae.com Wed Nov 29 13:34:47 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 10:34:47 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook> References: <002f01c713e0$b9e393d0$6900a8c0@notebook> Message-ID: On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas wrote: > Mark, your gbrowse_moby application uses lsids a lot. However, those > requests > would be from a single ip address. Right... but I don't think it creates 256+ requests at a time, since it is a low-throughput interface... I'd be surprised if gbrowse moby was the culprit here. M From markw at illuminae.com Wed Nov 29 13:34:47 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 10:34:47 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook> References: <002f01c713e0$b9e393d0$6900a8c0@notebook> Message-ID: On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas wrote: > Mark, your gbrowse_moby application uses lsids a lot. However, those > requests > would be from a single ip address. Right... but I don't think it creates 256+ requests at a time, since it is a low-throughput interface... I'd be surprised if gbrowse moby was the culprit here. M From ed.kawas at gmail.com Wed Nov 29 13:03:48 2006 From: ed.kawas at gmail.com (Ed Kawas) Date: Wed, 29 Nov 2006 10:03:48 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook> > We run a cron'd script from our server here that tests all > services in the registry every hour. I don't know for > certain if this is using LSID resolution as part of that task > (Eddie, can you confirm?), but it wouldn't surprise me if > that were the case. It does not. Pure api (findservice, etc). > > > > (2) How popular is LSID? Could services making use of the > 'lsid' SVR > > record be responsible for lots of DNS traffic? LIke 256+ sessions > > from the same IP? > > > We are increasingly using the LSID to represent *all* > entities in MOBY - datatypes, service types, web service > instances, etc. A tool like Taverna may well be resolving > all LSIDs in the MOBY registry each time it starts-up (?), > which could account for the traffic. Other client > applications will likely use LSID resolution in the same way > in the near future, if they don't already. Again, the IP > address would fairly quickly tell us whether these are > "scientists" or "scriptkiddies". > > Regardless, the use of LSIDs in MOBY is only going to > increase over time, so if it is becoming an issue now we > should think about how to manage it before it becomes a real > problem... > Mark, your gbrowse_moby application uses lsids a lot. However, those requests would be from a single ip address. Eddie From edward.kawas at gmail.com Wed Nov 29 10:22:14 2006 From: edward.kawas at gmail.com (Edward Kawas) Date: Wed, 29 Nov 2006 07:22:14 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> Message-ID: <001f01c713ca$27188b70$6900a8c0@notebook> Hi, >From reading *just* the 'aim' and 'problems' portion of this message, I was wondering whether you thought about using the agent for mirroring. Just throwing it out there, Eddie > -----Original Message----- > From: moby-dev-bounces at lists.open-bio.org > [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of > Andreas Groscurth > Sent: Wednesday, November 29, 2006 4:02 AM > To: moby-dev at lists.open-bio.org > Subject: [MOBY-dev] RFC - Synchronization of Biomoby > secondary repositories > > The following text describes the procedure of the > synchronization of Biomoby secondary repositories. > > Aim: Replicate BioMoby central > -to create mirrors > -to have redundancy in case of failure > -to create private sets of services, either filtered from the > global set (less services) or added to the global set (more services) > > Problems: > -synchronizing repositories > -cascading service/object registration requests -populating a > Moby central from scratch > > Solutions: > -The existing RSS feed is used to notify secondaries of > changes (register service/delete service/update service) to > the master -A complete RSS document is created by a new dump > method for initialization of Moby centrals from scratch > -Registrations are handled by the client and NOT cascaded > > 1. Synchronizing repositories > ============================= > > We propose that secondaries check the Biomoby RSS feed to be > notified whether changes in the registration have been done. > Currently the RSS feed is updated once a day, for more rapid > synchronization this would have to be changed. > The changes include registration, modification or deletion of > a service/object. If changes were applied to the Biomoby > Central registry the changes are adopted to the secondary. > The RSS contains the signature URL where the secondary picks > up the service RDF to retrieve all details required for the > registration using the existing RDF agent. > > i) Problems/changes required: > > The main question here is if unregistered services are > deleted completely from the central database or are marked as > inactive. The problem about that is, that the feed would need > to contain also the information of a deleted service, so that > the secondaries will retrieve that information. So Moby > central will have to keep a full transaction log also of deletions. > > 2. Filtering > ============ > > We propose that any secondary can apply filters to the RSS > feed and thus only include a subset of all services/objects. > This can be useful to make finding services from lists > easier, to tune workflows to performant services, only use > local services or to exclude test services. Information > relevant to filtering is in the RSS, like authority, > description, but maybe more will be relevant, then filtering > may need to happen at the level of service RDF. > > 3. Private services > =================== > > We propose that any client can register services with a Moby > central secondary, these will then be available only to > clients querying the secondary. If the secondary is in a > local network, this allows easy access control to local > services. Any secondary synchronizing to that repository will > of course inherit all those additional services, allowing > simple creation of local production Moby centrals and local > test Moby centrals. > > 4. Registration > =============== > > We propose to NOT cascade registration requests, i.e. pass > them on from secondary to master. That means that the client > has control over where a registration is done but also means > the client has to make that choice. Registration clients must > thus add an implementation that allows a user to choose the > Moby central where a service/object should be registered. > Registration always happens at the topmost Moby central node > where the service should be visible, all secondaries of this > Moby central will pick that service up by synchronization. > > Why? Cascading registration is cumbersome, as only once a > registration request has reached the topmost node can name > duplications etc. be resolved, which must then be passed to > the client. > > Name conflicts can still occur with locally registered services. > E.g., Adam registers a private service AnalyseThis on a > private secondary. Later, Beth registers AnalyseThis with > same authority on the Moby central master. The private > secondary picks this up from the RSS and runs into a name > duplication. Proposed solution: Local registrations MUST > ALWAYS use a local authority. E.g., Adam registers > AnalyseThis with authority InternalIP, and Beth registers > AnalyseThis with authority paul_vitti.com. Then, we assume > whoever registers a service at a more global Moby central > knows what we're doing and give synchronization precedence > over local registrations. E.g., a test registry is a > secondary of Moby central. Chris registers AnalyseThat with > authority paul_vitti.com in the test registry. Once he's > happy with testing, he registers AnalyseThat with authority > paul_vitti.com in Moby central. The test registry retrieves > this from the RSS, discards the local registration and > overwrites it with the registration picked up through the RSS. > > 5. Moby central failure > ======================= > > If a master Moby central fails, the secondaries continue > normal operation with no effect on service discovery for all > clients keyed to a secondary. However, registration is no > longer possible at the master node. Once the master node > comes back up, all secondaries must resync. > > 6. Adaptations to the RSS > ========================= > > For this procedure the current RSS feed has to be changed > marginally, to enable on the one hand the correct > notification of the secondaries, on the other hand to ensure > that the normal RSS reader still work the usual way. The > current RSS feed mainly uses the Dublin Core Metadata to > provide the information, so to add additional information to > the feed it is only needed to add more Dublin Core Metadata. > > Primarily the feed has to contain the information whether the > service is new, modified or deleted. Additionally the service > rdf has to be linked in the feed to enable the local RDF > agent to apply the changes with the information of the > service rdf to the local secondary. > If other additional information shall be added to the feed to > provide more possibilities to filter the services can be discussed. > > 7. Resync > ========= > > Another main aspect is the problem if a repository is out of > sync (e.g. due to a temporary failure of master or > secondary). The RSS feed has a limited length, which means a > limited number of transactions are contained. Possibly, this > will mean it does not contain all transactions since the last > sync of a secondary. > > > i) Solution > We propose that each repository will store a time stamp of > the last synchronization. In case that in the next > synchronization process the oldest changes in the feed are > older than the current sync time stamp of the repository, we > run the risk to not receive all information about service > changes. In this case the secondary should be able to ask the > primary to create a RSS feed with all changes which have > happened since the current time stamp of the secondary. > > 8. Initial load > =============== > > When populating a new secondary from scratch, all registered > services/ objects need to be received from the master Moby > central. We propose a new method in Moby central to request > all registered services/ objects as RSS. Then, the > initialization proceeds exactly like a synchronization. > > > > So to kick off the discussion here are some of our questions: > > 1.Is it reasonable to use the existing RSS feed for this procedure ? > It sounds very handy and avoids creating a similar complete > new structure > > 2.Does any structure keep track of deleted services ? > > 3.Resync: Is it reasonable to timestamp all transactions in > Moby central? Or should we solve the resync issue by > enforcing a full drop/ emptying of the secondary and reload > all data as in initial load? > > > Thanks > Heiko & Andreas > > -- > Andreas Groscurth > Diplom Bioinformatik - PhD Student > Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10 > 50829 Cologne > Germany > E-mail: ? ?groscurt at mpiz-koeln.mpg.de > Phone: ? ?+49(0)221-5062-447 > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev From markw at illuminae.com Wed Nov 29 18:52:23 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 15:52:23 -0800 Subject: [MOBY-dev] Holy cow! Lotsa hits! Message-ID: 852,000 hits on MOBY Central in November. That's a new record :-) M -- -- Mark Wilkinson Assistant Professor, Dept. Medical Genetics University of British Columbia PI Bioinformatics iCAPTURE Centre, St. Paul's Hospital From schoof at mpiz-koeln.mpg.de Thu Nov 30 04:39:46 2006 From: schoof at mpiz-koeln.mpg.de (Heiko Schoof) Date: Thu, 30 Nov 2006 10:39:46 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> Message-ID: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Hi Mark, Eddie, we already use the RDF agent, from the RSS we intend to pull mainly the signature URLs, then we propose to use the RDF agent to get all data. ---quote--- The RSS contains the signature URL where the secondary picks up the service RDF to retrieve all details required for the registration using the existing RDF agent. ---/quote--- The advantage of the RSS versus the API call to retrieve ALL signature URLs is: -scalability: If there are 1000s of signature URLs...with the RSS, we only retrieve changes -filtering: ability to filter already based on data in the RSS with no need to actually retrieve the service RDF; should improve filtering performance as it's one request instead of potentially hundreds plus the need to parse all those RDF. However, for the initialization/from scratch, this method indeed makes most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI where we can post that? Do you intend to come "work" at the MPIZ next week? If yes, when? I'm free Thursday afternoon and most of Friday. Best, Heiko On 29. Nov 2006, at 18:31 Uhr, Mark Wilkinson wrote: > Hi Andreas! > > Thanks for taking the time to put this document together. Using > the RSS > feed is an interesting idea. My first instinct is that it might > not be > "robust" enough, but I suppose if we spent more time thinking about > what > information is passed on that RSS feed it might work quite well! > > Have you considered taking advantage of the recent move towards > distributed service signatures? The RDF Agent is capable of > consuming a > list of URLs, recovering the RDF signatures from those URLs, and > rebuilding the entire registry from those RDF documents. It is also a > simple API call to MOBY Central that generates the list of URLs > representing all of the service signatures. As such, a full mirroring > operation should require nothing more than a single call to the > primary > MOBY Central, and passing the result of that call to the RDF agent of > the mirror site and letting it run... Eddie, correct me if that isn't > true... > > I'm going to be at your institute this time next week, so let's talk > about it more in person :-) > > Best wishes! > > Mark > > > > On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: >> The following text describes the procedure of the synchronization of >> Biomoby secondary repositories. >> >> Aim: Replicate BioMoby central >> -to create mirrors >> -to have redundancy in case of failure >> -to create private sets of services, either filtered from the global >> set (less services) or added to the global set (more services) >> >> Problems: >> -synchronizing repositories >> -cascading service/object registration requests >> -populating a Moby central from scratch >> >> Solutions: >> -The existing RSS feed is used to notify secondaries of changes >> (register service/delete service/update service) to the master >> -A complete RSS document is created by a new dump method for >> initialization of Moby centrals from scratch >> -Registrations are handled by the client and NOT cascaded >> >> 1. Synchronizing repositories >> ============================= >> >> We propose that secondaries check the Biomoby RSS feed to be >> notified whether changes in the registration have been done. >> Currently the RSS feed is updated once a day, for more rapid >> synchronization this would have to be changed. >> The changes include registration, modification or deletion of a >> service/object. If changes were applied to the Biomoby Central >> registry the changes are adopted to the secondary. >> The RSS contains the signature URL where the secondary picks up >> the service RDF to retrieve all details required for the >> registration using the existing RDF agent. >> >> i) Problems/changes required: >> >> The main question here is if unregistered services are deleted >> completely from the central database or are marked as inactive. The >> problem about that is, that the feed would need to contain also the >> information of a deleted service, so that the secondaries will >> retrieve that information. So Moby central will have to keep a full >> transaction log also of deletions. >> >> 2. Filtering >> ============ >> >> We propose that any secondary can apply filters to the RSS feed and >> thus only include a subset of all services/objects. This can be >> useful to make finding services from lists easier, to tune workflows >> to performant services, only use local services or to exclude test >> services. Information relevant to filtering is in the RSS, like >> authority, description, but maybe more will be relevant, then >> filtering may need to happen at the level of service RDF. >> >> 3. Private services >> =================== >> >> We propose that any client can register services with a Moby central >> secondary, these will then be available only to clients querying the >> secondary. If the secondary is in a local network, this allows easy >> access control to local services. Any secondary synchronizing to that >> repository will of course inherit all those additional services, >> allowing simple creation of local production Moby centrals and local >> test Moby centrals. >> >> 4. Registration >> =============== >> >> We propose to NOT cascade registration requests, i.e. pass them on >> from secondary to master. That means that the client has control over >> where a registration is done but also means the client has to make >> that choice. Registration clients must thus add an implementation >> that allows a user to choose the Moby central where a service/object >> should be registered. Registration always happens at the topmost Moby >> central node where the service should be visible, all secondaries of >> this Moby central will pick that service up by synchronization. >> >> Why? Cascading registration is cumbersome, as only once a >> registration request has reached the topmost node can name >> duplications etc. be resolved, which must then be passed to the >> client. >> >> Name conflicts can still occur with locally registered services. >> E.g., Adam registers a private service AnalyseThis on a private >> secondary. Later, Beth registers AnalyseThis with same authority on >> the Moby central master. The private secondary picks this up from the >> RSS and runs into a name duplication. Proposed solution: Local >> registrations MUST ALWAYS use a local authority. E.g., Adam registers >> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis >> with authority paul_vitti.com. Then, we assume whoever registers a >> service at a more global Moby central knows what we're doing and give >> synchronization precedence over local registrations. E.g., a test >> registry is a secondary of Moby central. Chris registers AnalyseThat >> with authority paul_vitti.com in the test registry. Once he's happy >> with testing, he registers AnalyseThat with authority paul_vitti.com >> in Moby central. The test registry retrieves this from the RSS, >> discards the local registration and overwrites it with the >> registration picked up through the RSS. >> >> 5. Moby central failure >> ======================= >> >> If a master Moby central fails, the secondaries continue normal >> operation with no effect on service discovery for all clients keyed >> to a secondary. However, registration is no longer possible at the >> master node. Once the master node comes back up, all secondaries must >> resync. >> >> 6. Adaptations to the RSS >> ========================= >> >> For this procedure the current RSS feed has to be changed >> marginally, to >> enable on the one hand the correct notification of the secondaries, >> on the other hand to ensure that the normal RSS reader still work the >> usual way. The current RSS feed mainly uses the Dublin Core Metadata >> to provide the information, so to add additional information to the >> feed it is only needed to add more Dublin Core Metadata. >> >> Primarily the feed has to contain the information whether the service >> is new, modified or deleted. Additionally the service rdf has to be >> linked in the feed to enable the local RDF agent to apply the changes >> with the information of the service rdf to the local secondary. >> If other additional information shall be added to the feed to provide >> more possibilities to filter the services can be discussed. >> >> 7. Resync >> ========= >> >> Another main aspect is the problem if a repository is out of sync >> (e.g. due to a temporary failure of master or secondary). The RSS >> feed has a limited length, which means a limited number of >> transactions are contained. Possibly, this will mean it does not >> contain all transactions since the last sync of a secondary. >> >> >> i) Solution >> We propose that each repository will store a time stamp of >> the last synchronization. In case that >> in the next synchronization process the oldest changes in the feed >> are older than the current sync time stamp of the repository, >> we run the risk to not receive all information >> about service changes. In this case the secondary should be able to >> ask the primary to create a RSS feed with all changes which have >> happened since the current time stamp of the secondary. >> >> 8. Initial load >> =============== >> >> When populating a new secondary from scratch, all registered >> services/ >> objects need to be received from the master Moby central. We propose >> a new method in Moby central to request all registered services/ >> objects as RSS. Then, the initialization proceeds exactly like a >> synchronization. >> >> >> >> So to kick off the discussion here are some of our questions: >> >> 1.Is it reasonable to use the existing RSS feed for this procedure ? >> It sounds very handy and avoids creating a similar complete new >> structure >> >> 2.Does any structure keep track of deleted services ? >> >> 3.Resync: Is it reasonable to timestamp all transactions in Moby >> central? Or should we solve the resync issue by enforcing a full >> drop/ >> emptying of the secondary and reload all data as in initial load? >> >> >> Thanks >> Heiko & Andreas >> >> -- >> Andreas Groscurth >> Diplom Bioinformatik - PhD Student >> Max Planck Institute for Plant Breeding Research >> Carl-von-Linn?-Weg 10 >> 50829 Cologne >> Germany >> E-mail: groscurt at mpiz-koeln.mpg.de >> Phone: +49(0)221-5062-447 >> >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > -- > Mark Wilkinson > Asst. Professor, Dept. of Medical Genetics > University of British Columbia > PI in Bioinformatics, iCAPTURE Centre > St. Paul's Hospital, Rm. 166, 1081 Burrard St. > Vancouver, BC, V6Z 1Y6 > tel: 604 682 2344 x62129 > fax: 604 806 9274 > > "Scientists would rather share their toothbrush than their data" > - Carole Goble > > ================== > > > ***CONFIDENTIALITY NOTICE*** > This electronic message is intended only for the use of the addressee > and may contain information that is privileged and confidential. Any > dissemination, distribution or copying of this communication by > unauthorized individuals is strictly prohibited. If you have received > this communication in error, please notify the sender immediately by > reply e-mail and delete the original and all copies from your system. > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev From dgonzalez at cnio.es Thu Nov 30 09:43:20 2006 From: dgonzalez at cnio.es (David G. Pisano) Date: Thu, 30 Nov 2006 15:43:20 +0100 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: References: Message-ID: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> Mark, Do you have historic records? It would be nice for presentations ;) (and grats to everybody, by the way) David On 30 Nov, 2006, at 12:52 AM, Mark Wilkinson wrote: > 852,000 hits on MOBY Central in November. That's a new record :-) > > M > > > -- > -- > Mark Wilkinson > Assistant Professor, Dept. Medical Genetics > University of British Columbia > PI Bioinformatics > iCAPTURE Centre, St. Paul's Hospital > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. From markw at illuminae.com Thu Nov 30 11:40:28 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 30 Nov 2006 08:40:28 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Message-ID: > we already use the RDF agent, from the RSS we intend to pull mainly the > signature URLs, then we propose to use the RDF agent to get all data. Ah! I see. > The advantage of the RSS versus the API call to retrieve ALL signature > URLs is: > -scalability: If there are 1000s of signature URLs...with the RSS, we > only retrieve changes I would need to modify the RSS feed such that it reports additions *and* removals (right now it is just additions), and we would have to come up with a formal way of representing these... and then the RSS functionality and features would need to become a formal part of the MOBY API (I can just hear Dr. Senger yelling at us right now that we're considering building core functionality on parts of MOBY that are entirely undocumented ;-) ;-) ). I guess that's why I am so hesitant to use RSS. However, I guess so long as this is not a "recommended" practice, only a short-cut; and as long as it is *always* possible to use a true API call to mirror the registry, and we formally say what the recommended best-practice is, then it's reasonable to have a non-guaranteed alternative that is more lightweight. > However, for the initialization/from scratch, this method indeed makes > most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI > where we can post that? There currently is no Wiki running on BioMoby. The last Wiki we had was hacked, and we just never brought it back up again. We tried using Bugzilla as a way of tracking RFC's, but that didn't make many people very happy, so we're simply using the mailing list, with some formal write-up in an attachment. > Do you intend to come "work" at the MPIZ next week? Do I have to answer that publicly? ;-) The answer is "Yes". We should get a variety of things - MOBY-wise and otherwise - sorted out between us while I am there. I'm free all day Thursday and Friday, so that should give us plenty of time. Cheers! M From schoof at mpiz-koeln.mpg.de Thu Nov 30 13:01:48 2006 From: schoof at mpiz-koeln.mpg.de (Heiko Schoof) Date: Thu, 30 Nov 2006 19:01:48 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Message-ID: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> On 30. Nov 2006, at 17:40 Uhr, Mark Wilkinson wrote: > and then the RSS functionality and features would need to become a > formal part of the MOBY API (I can just hear Dr. Senger yelling at > us right now that we're considering building core functionality on > parts of MOBY that are entirely undocumented ;-) ;-) ). I guess > that's why I am so hesitant to use RSS. > > However, I guess so long as this is not a "recommended" practice, > only a short-cut; and as long as it is *always* possible to use a > true API call to mirror the registry, and we formally say what the > recommended best-practice is, then it's reasonable to have a non- > guaranteed alternative that is more lightweight. > What we are proposing is to make new RSS functionality that will be part of the core API. It just so happens that RSS and the surrounding toolkit is well suited to the purpose, and more fitting to Moby than other solutions we've looked at. Why should we make a new API call that spews out some custom XML if we can perfectly use RSS within its specs and get a core RSS feed for "human"/aggregator consumption at the same time for free? We stated that we will need to modify the RSS, though not breaking anything as far as we can see. We were not proposing to use RSS just because there's existing functionality ;-) we're not quite THAT lazy...though almost. And... isn't it *cool* to use RSS for some real work? What is a true API call? Why is a call to the RSS feed not a true API call, if we make it part of the API? RSS is a tested, scaleable technology, which is why we propose to use it, as we envision hundreds of Moby clients maintaining their local cache (like Dashboard) through that functionality. Which is one thing we haven't mentioned yet, caching of Moby central for clients could easily build on the Moby secondary functionality, we think. But maybe Martin or others with experience on Moby central caching should comment on that. Heiko From markw at illuminae.com Thu Nov 30 14:10:08 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 30 Nov 2006 11:10:08 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> Message-ID: Hi Heiko, > Why should we make a new API call that spews out some custom XML if we > can perfectly use RSS within its specs and get a core RSS feed for > "human"/aggregator consumption at the same time for free? We stated that > we will need to modify the RSS, though not breaking anything as far as > we can see. I think we are effectively saying the same thing; I am not suggesting that we make a new API call, I'm suggesting that there are API call's that exist already that could be used for this purpose, albeit not so conveniently as your RSS suggestion. > We were not proposing to use RSS just because there's existing > functionality ;-) we're not quite THAT lazy...though almost. And... > isn't it *cool* to use RSS for some real work? Well... I guess this is the issue. You're proposing to use RSS for a purpose for which it was not (IMO) designed. As such, we would have to create new conventions around the RSS feed (hereafter called MOBY-RSS) that may or may not be more widely accepted in the world. I agree 100% that it would be VERY cool to use RSS in this way, but v.v. a robust solution to the problem I'm not entirely convinced. The amount of RSS-RDF we would have to maintain on MOBY Central in order to have a complete history that would allow a mirror to reliably re-construct the current state of the database is... well... large! At the moment, I keep only the last... 100?... changes. If you don't pick-up the feed for a day, or if someone registers 1000 new services, you wont see them in the feed. To be safe, we would have to keep *all* changes in the RDF document at MOBY Central, in which case the overhead of calling the feed versus using the MOBY Central API would be about the same. I'm not *opposed* to the idea of using RSS, and I agree that it is a novel and "cool" use for it, but I am concerned that we will perpetuate the MOBY reputation of making ad hoc decisions around other standards... (which isn't necessarily BAD, it just gives us a reputation for being maveriks, which angers the reviewers :-) ) Let's talk about it over a Koelsch (or two) next week! M -- -- Mark Wilkinson Assistant Professor, Dept. Medical Genetics University of British Columbia PI Bioinformatics iCAPTURE Centre, St. Paul's Hospital ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From martin.senger at gmail.com Thu Nov 30 10:08:37 2006 From: martin.senger at gmail.com (Martin Senger) Date: Thu, 30 Nov 2006 15:08:37 +0000 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> Message-ID: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> Well, let me be more realistic. I do not like so called "Potemkin's" villages (see google if it does not ring any bell). Of course, I am glad that BioMoby is growing, and I will support it in any possible way. And I also understand that some facts are good for PR and for funding agencies. But... For us, we should be more precise what these hits actually mean. For example, I assume (please correct me if I am wrong) that every time somebody updates her local cash from Dashboard, it increases the hit numbers. Also, all these automated tools may influence how many times the registry is accessed. Which gives us a distorted picture. Better, possibly, would be to agree to an HTTP agent name (or names) that we can use in this automatic tools - and to separate in the statistics *all* hits (good for funding agencies) from the other hits where we do not include requests from this (or these) HTTP agent(s). Just my "c's, Martin -- Martin Senger email: martin.senger at gmail.com skype: martinsenger From jmfernandez at cnio.es Thu Nov 30 16:06:53 2006 From: jmfernandez at cnio.es (=?ISO-8859-1?Q?Jos=E9_Mar=EDa_Fern=E1ndez_Gonz=E1lez?=) Date: Thu, 30 Nov 2006 22:06:53 +0100 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> Message-ID: <456F47ED.1060800@cnio.es> You should have some way to distinguish automated tools entries from the other (with some help from the tool developers, of course). As every request is done through HTTP, you could use User Agent signature recorded in Apache logs. For instance, each tool could use a different 'User Agent' variant, so they could be distinguished, or if a program/tool is going can issue requests related to maintenance, it should be advisable to alter its 'User Agent' signature in some way based on their mode. Just my 2 euro-cents. Jos? Mar?a Martin Senger wrote: > Well, let me be more realistic. I do not like so called "Potemkin's" > villages (see google if it does not ring any bell). Of course, I am glad > that BioMoby is growing, and I will support it in any possible way. And I > also understand that some facts are good for PR and for funding agencies. > > But... For us, we should be more precise what these hits actually mean. For > example, I assume (please correct me if I am wrong) that every time somebody > updates her local cash from Dashboard, it increases the hit numbers. Also, > all these automated tools may influence how many times the registry is > accessed. Which gives us a distorted picture. > > Better, possibly, would be to agree to an HTTP agent name (or names) that we > can use in this automatic tools - and to separate in the statistics *all* > hits (good for funding agencies) from the other hits where we do not include > requests from this (or these) HTTP agent(s). > > Just my "c's, > Martin > -- Jos? Mar?a Fern?ndez Gonz?lez Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 2256) e-mail: jmfernandez at cnio.es Fax: (+34) 91 224 69 76 Biolog?a Estructural y Bioinform?tica Structural Biology and Bioinformatics Centro Nacional de Investigaciones Oncol?gicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fern?ndez Almagro, 3 Madrid (Spain) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. From martin.senger at gmail.com Thu Nov 30 16:48:08 2006 From: martin.senger at gmail.com (Martin Senger) Date: Thu, 30 Nov 2006 21:48:08 +0000 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <456F47ED.1060800@cnio.es> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> <456F47ED.1060800@cnio.es> Message-ID: <4d93f07c0611301348v1289a6a3u13139a49a6d55678@mail.gmail.com> > you could use User Agent signature Well, that's what I said :-) Martin -- Martin Senger email: martin.senger at gmail.com skype: martinsenger From Pieter.Neerincx at wur.nl Wed Nov 8 10:56:05 2006 From: Pieter.Neerincx at wur.nl (Pieter Neerincx) Date: Wed, 8 Nov 2006 11:56:05 +0100 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <44E387CB.2080905@ucalgary.ca> References: <4d93f07c0608160850w68eeb88l185365d679c2edbe@mail.gmail.com> <44E37AC5.8080105@ucalgary.ca> <1155760984.6594.23.camel@bioinfo.icapture.ubc.ca> <44E387CB.2080905@ucalgary.ca> Message-ID: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl> Hi all, I'm having a problem with the BioMOBY ping thing. As far as I know the services I had registered in the central BioMOBY Central respond correctly to a BioMOBY ping request. They are listed as dead though on the BioMOBY website and I'm wondering why. The current suspects are: * Base64 encoded output. Does the agent decode base64 content correctly? * HTTPS. My services require an https connection. If the agent is using Perl code it will probably complain about not being able to validate the certificate, but execute anyway. If the agent was written in Java it will refuse to execute the service if the SSL certificates can not be validated. Our certificates are self-signed, so you'd have to add them to your keystore to be able to execute our services with a Java client. My services might need an update to take advantage of LSID resolution and the asynchronous one needs to be rewritten for our new BioMOBY async services standard, but they are not dead! Something else: I plan on resuming my SOAP::Lite testing with the latest and greatest version. Is there anybody out there who is currently successfully running a (patched) S::L version > 0.60? Cheers, Pi From edward.kawas at gmail.com Wed Nov 8 14:22:42 2006 From: edward.kawas at gmail.com (Edward Kawas) Date: Wed, 8 Nov 2006 06:22:42 -0800 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <5673901C-7386-4598-B6F7-7C588AE7F1CB@wur.nl> Message-ID: <003601c70341$5b74fe60$6d00a8c0@notebook> Hi Pieter, > I'm having a problem with the BioMOBY ping thing. As far as I > know the services I had registered in the central BioMOBY > Central respond correctly to a BioMOBY ping request. They are > listed as dead though on the BioMOBY website and I'm > wondering why. The current suspects are: > > * Base64 encoded output. Does the agent decode base64 content > correctly? > * HTTPS. My services require an https connection. If the > agent is using Perl code it will probably complain about not > being able to validate the certificate, but execute anyway. > If the agent was written in Java it will refuse to execute > the service if the SSL certificates can not be validated. Our > certificates are self-signed, so you'd have to add them to > your keystore to be able to execute our services with a Java client. > I bet that its listed as dead because of authentication. What can I do to get around this? Thanks, Eddie From gordonp at ucalgary.ca Wed Nov 8 14:56:02 2006 From: gordonp at ucalgary.ca (Paul Gordon) Date: Wed, 08 Nov 2006 07:56:02 -0700 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <003601c70341$5b74fe60$6d00a8c0@notebook> References: <003601c70341$5b74fe60$6d00a8c0@notebook> Message-ID: <4551F002.2040909@ucalgary.ca> It;'s not an immediate solution, but I would suggest to that people providing public SSL services get a real certificate if they can rustle up the money (only about $100/year). I know most MOBY clients won't connect to unauthenticated services either, so making it really signed by an authority will make it so much more useful... > Hi Pieter, > > >> I'm having a problem with the BioMOBY ping thing. As far as I >> know the services I had registered in the central BioMOBY >> Central respond correctly to a BioMOBY ping request. They are >> listed as dead though on the BioMOBY website and I'm >> wondering why. The current suspects are: >> >> * Base64 encoded output. Does the agent decode base64 content >> correctly? >> * HTTPS. My services require an https connection. If the >> agent is using Perl code it will probably complain about not >> being able to validate the certificate, but execute anyway. >> If the agent was written in Java it will refuse to execute >> the service if the SSL certificates can not be validated. Our >> certificates are self-signed, so you'd have to add them to >> your keystore to be able to execute our services with a Java client. >> >> > I bet that its listed as dead because of authentication. What can I do to get > around this? > > Thanks, > > Eddie > > > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev > > > From Pieter.Neerincx at wur.nl Fri Nov 10 10:59:57 2006 From: Pieter.Neerincx at wur.nl (Pieter Neerincx) Date: Fri, 10 Nov 2006 11:59:57 +0100 Subject: [MOBY-dev] BioMOBY ping In-Reply-To: <4551F002.2040909@ucalgary.ca> References: <003601c70341$5b74fe60$6d00a8c0@notebook> <4551F002.2040909@ucalgary.ca> Message-ID: Hi Eddie and Paul, On 8-Nov-2006, at 3:56 PM, Paul Gordon wrote: > It;'s not an immediate solution, but I would suggest to that people > providing public SSL services get a real certificate if they can > rustle > up the money (only about $100/year). I know most MOBY clients won't > connect to unauthenticated services either, so making it really signed > by an authority will make it so much more useful... Ok, I know that a certificate signed by one of the "big" certificate authorities would make life a little easier, but our self-signed certificates are just as real and valid :). The problem is the distribution of the certificates. I would have to drop by at your office in person with my passport to prove I am who I claim to be and the certificate on for example a USB-stick. If I would send you the certificate in a plain e-mail, you can not verify whether it's really my certificate or a fake one. Anyway, that distribution problem can also be solved without $100. I'll add some documentation to the site for people who want to use HTTPS for their services and/or BioMOBY Central... Cheers, Pi >> Hi Pieter, >> >> >>> I'm having a problem with the BioMOBY ping thing. As far as I >>> know the services I had registered in the central BioMOBY >>> Central respond correctly to a BioMOBY ping request. They are >>> listed as dead though on the BioMOBY website and I'm >>> wondering why. The current suspects are: >>> >>> * Base64 encoded output. Does the agent decode base64 content >>> correctly? >>> * HTTPS. My services require an https connection. If the >>> agent is using Perl code it will probably complain about not >>> being able to validate the certificate, but execute anyway. >>> If the agent was written in Java it will refuse to execute >>> the service if the SSL certificates can not be validated. Our >>> certificates are self-signed, so you'd have to add them to >>> your keystore to be able to execute our services with a Java client. >>> >>> >> I bet that its listed as dead because of authentication. What can >> I do to get >> around this? >> >> Thanks, >> >> Eddie >> >> >> >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev >> >> >> > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev Wageningen University and Research centre (WUR) Laboratory of Bioinformatics Transitorium (building 312) room 1034 Dreijenlaan 3 6703 HA Wageningen The Netherlands phone: 0317-483 060 fax: 0317-483 584 mobile: 06-143 66 783 pieter.neerincx at wur.nl From groscurt at mpiz-koeln.mpg.de Wed Nov 29 12:02:05 2006 From: groscurt at mpiz-koeln.mpg.de (Andreas Groscurth) Date: Wed, 29 Nov 2006 13:02:05 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories Message-ID: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> The following text describes the procedure of the synchronization of Biomoby secondary repositories. Aim: Replicate BioMoby central -to create mirrors -to have redundancy in case of failure -to create private sets of services, either filtered from the global set (less services) or added to the global set (more services) Problems: -synchronizing repositories -cascading service/object registration requests -populating a Moby central from scratch Solutions: -The existing RSS feed is used to notify secondaries of changes (register service/delete service/update service) to the master -A complete RSS document is created by a new dump method for initialization of Moby centrals from scratch -Registrations are handled by the client and NOT cascaded 1. Synchronizing repositories ============================= We propose that secondaries check the Biomoby RSS feed to be notified whether changes in the registration have been done. Currently the RSS feed is updated once a day, for more rapid synchronization this would have to be changed. The changes include registration, modification or deletion of a service/object. If changes were applied to the Biomoby Central registry the changes are adopted to the secondary. The RSS contains the signature URL where the secondary picks up the service RDF to retrieve all details required for the registration using the existing RDF agent. i) Problems/changes required: The main question here is if unregistered services are deleted completely from the central database or are marked as inactive. The problem about that is, that the feed would need to contain also the information of a deleted service, so that the secondaries will retrieve that information. So Moby central will have to keep a full transaction log also of deletions. 2. Filtering ============ We propose that any secondary can apply filters to the RSS feed and thus only include a subset of all services/objects. This can be useful to make finding services from lists easier, to tune workflows to performant services, only use local services or to exclude test services. Information relevant to filtering is in the RSS, like authority, description, but maybe more will be relevant, then filtering may need to happen at the level of service RDF. 3. Private services =================== We propose that any client can register services with a Moby central secondary, these will then be available only to clients querying the secondary. If the secondary is in a local network, this allows easy access control to local services. Any secondary synchronizing to that repository will of course inherit all those additional services, allowing simple creation of local production Moby centrals and local test Moby centrals. 4. Registration =============== We propose to NOT cascade registration requests, i.e. pass them on from secondary to master. That means that the client has control over where a registration is done but also means the client has to make that choice. Registration clients must thus add an implementation that allows a user to choose the Moby central where a service/object should be registered. Registration always happens at the topmost Moby central node where the service should be visible, all secondaries of this Moby central will pick that service up by synchronization. Why? Cascading registration is cumbersome, as only once a registration request has reached the topmost node can name duplications etc. be resolved, which must then be passed to the client. Name conflicts can still occur with locally registered services. E.g., Adam registers a private service AnalyseThis on a private secondary. Later, Beth registers AnalyseThis with same authority on the Moby central master. The private secondary picks this up from the RSS and runs into a name duplication. Proposed solution: Local registrations MUST ALWAYS use a local authority. E.g., Adam registers AnalyseThis with authority InternalIP, and Beth registers AnalyseThis with authority paul_vitti.com. Then, we assume whoever registers a service at a more global Moby central knows what we're doing and give synchronization precedence over local registrations. E.g., a test registry is a secondary of Moby central. Chris registers AnalyseThat with authority paul_vitti.com in the test registry. Once he's happy with testing, he registers AnalyseThat with authority paul_vitti.com in Moby central. The test registry retrieves this from the RSS, discards the local registration and overwrites it with the registration picked up through the RSS. 5. Moby central failure ======================= If a master Moby central fails, the secondaries continue normal operation with no effect on service discovery for all clients keyed to a secondary. However, registration is no longer possible at the master node. Once the master node comes back up, all secondaries must resync. 6. Adaptations to the RSS ========================= For this procedure the current RSS feed has to be changed marginally, to enable on the one hand the correct notification of the secondaries, on the other hand to ensure that the normal RSS reader still work the usual way. The current RSS feed mainly uses the Dublin Core Metadata to provide the information, so to add additional information to the feed it is only needed to add more Dublin Core Metadata. Primarily the feed has to contain the information whether the service is new, modified or deleted. Additionally the service rdf has to be linked in the feed to enable the local RDF agent to apply the changes with the information of the service rdf to the local secondary. If other additional information shall be added to the feed to provide more possibilities to filter the services can be discussed. 7. Resync ========= Another main aspect is the problem if a repository is out of sync (e.g. due to a temporary failure of master or secondary). The RSS feed has a limited length, which means a limited number of transactions are contained. Possibly, this will mean it does not contain all transactions since the last sync of a secondary. i) Solution We propose that each repository will store a time stamp of the last synchronization. In case that in the next synchronization process the oldest changes in the feed are older than the current sync time stamp of the repository, we run the risk to not receive all information about service changes. In this case the secondary should be able to ask the primary to create a RSS feed with all changes which have happened since the current time stamp of the secondary. 8. Initial load =============== When populating a new secondary from scratch, all registered services/ objects need to be received from the master Moby central. We propose a new method in Moby central to request all registered services/ objects as RSS. Then, the initialization proceeds exactly like a synchronization. So to kick off the discussion here are some of our questions: 1.Is it reasonable to use the existing RSS feed for this procedure ? It sounds very handy and avoids creating a similar complete new structure 2.Does any structure keep track of deleted services ? 3.Resync: Is it reasonable to timestamp all transactions in Moby central? Or should we solve the resync issue by enforcing a full drop/ emptying of the secondary and reload all data as in initial load? Thanks Heiko & Andreas -- Andreas Groscurth Diplom Bioinformatik - PhD Student Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10 50829 Cologne Germany E-mail: ? ?groscurt at mpiz-koeln.mpg.de Phone: ? ?+49(0)221-5062-447 From dag at sonsorol.org Wed Nov 29 14:20:46 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed, 29 Nov 2006 09:20:46 -0500 Subject: [MOBY-dev] question for moby devs/architects regarding use of DNS Message-ID: Hi folks, I just installed a new firewall (or in fancy terms 'unified threat management appliance' ) upstream of the main open-bio.org servers. One of the more interesting reports so far is that a number of IP addresses have been opening up very large numbers of TCP connections to the main open-bio.org web/DNS/mailserver. We are talking about 256 + simultaneous TCP sessions heading our way from the same remote IP address. Some of this is just web spidering and FTP mirroring but quite a bit of the traffic (oddly enough) is DNS related. We have an open DNS server and it is quite likely that people have found this out and are using us for recursive DNS queries. It is actually pretty easy to constrain/lock this down but that DNS server is also the primary nameserver for biomoby.org and the very special LSID SVR identifier used for LSID discovery operations. I guess I have the following questions/requests for the moby expert community: (1) In the way that moby is architected is it expected that either clients or servers would generate lots of DNS traffic for biomoby.org? If what I am seeing is 'normal' then I just want to leave things alone. (2) How popular is LSID? Could services making use of the 'lsid' SVR record be responsible for lots of DNS traffic? LIke 256+ sessions from the same IP? (3) I am going to reconfigure the DNS server so that we don't recursively answer DNS requests for other domains (like 'cnn.com' etc.) while still allowing anyone in the world to query the biomoby.org DNS zone. Can the moby developers/leaders elect a point person that I can remain in contact with while we do this work? I want to make sure that we don't affect/break moby services while this work is done. Thanks! -Chris OBF From markw at illuminae.com Wed Nov 29 17:41:40 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:41:40 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regarding use of DNS In-Reply-To: References: Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote: > to the main open-bio.org web/DNS/mailserver. We are talking about 256 > + simultaneous TCP sessions heading our way from the same remote IP > address. I guess the first question is "which IP address?" :-) > (1) In the way that moby is architected is it expected that either > clients or servers would generate lots of DNS traffic for > biomoby.org? If what I am seeing is 'normal' then I just want to > leave things alone. We run a cron'd script from our server here that tests all services in the registry every hour. I don't know for certain if this is using LSID resolution as part of that task (Eddie, can you confirm?), but it wouldn't surprise me if that were the case. > (2) How popular is LSID? Could services making use of the 'lsid' SVR > record be responsible for lots of DNS traffic? LIke 256+ sessions > from the same IP? We are increasingly using the LSID to represent *all* entities in MOBY - datatypes, service types, web service instances, etc. A tool like Taverna may well be resolving all LSIDs in the MOBY registry each time it starts-up (?), which could account for the traffic. Other client applications will likely use LSID resolution in the same way in the near future, if they don't already. Again, the IP address would fairly quickly tell us whether these are "scientists" or "scriptkiddies". Regardless, the use of LSIDs in MOBY is only going to increase over time, so if it is becoming an issue now we should think about how to manage it before it becomes a real problem... > (3) I am going to reconfigure the DNS server so that we don't > recursively answer DNS requests for other domains (like 'cnn.com' > etc.) while still allowing anyone in the world to query the > biomoby.org DNS zone. Can the moby developers/leaders elect a point > person that I can remain in contact with while we do this work? Eddie Kawas: ed.kawas at gmail.com I'm in the lab until tomorrow, and then away for about 10 days in Germany, so he's the one who will answer your questions most rapidly. > I > want to make sure that we don't affect/break moby services while this > work is done. :-) thanks Chris! Best wishes, M -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From markw at illuminae.com Wed Nov 29 17:41:40 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:41:40 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regarding use of DNS In-Reply-To: References: Message-ID: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> On Wed, 2006-11-29 at 09:20 -0500, Chris Dagdigian wrote: > to the main open-bio.org web/DNS/mailserver. We are talking about 256 > + simultaneous TCP sessions heading our way from the same remote IP > address. I guess the first question is "which IP address?" :-) > (1) In the way that moby is architected is it expected that either > clients or servers would generate lots of DNS traffic for > biomoby.org? If what I am seeing is 'normal' then I just want to > leave things alone. We run a cron'd script from our server here that tests all services in the registry every hour. I don't know for certain if this is using LSID resolution as part of that task (Eddie, can you confirm?), but it wouldn't surprise me if that were the case. > (2) How popular is LSID? Could services making use of the 'lsid' SVR > record be responsible for lots of DNS traffic? LIke 256+ sessions > from the same IP? We are increasingly using the LSID to represent *all* entities in MOBY - datatypes, service types, web service instances, etc. A tool like Taverna may well be resolving all LSIDs in the MOBY registry each time it starts-up (?), which could account for the traffic. Other client applications will likely use LSID resolution in the same way in the near future, if they don't already. Again, the IP address would fairly quickly tell us whether these are "scientists" or "scriptkiddies". Regardless, the use of LSIDs in MOBY is only going to increase over time, so if it is becoming an issue now we should think about how to manage it before it becomes a real problem... > (3) I am going to reconfigure the DNS server so that we don't > recursively answer DNS requests for other domains (like 'cnn.com' > etc.) while still allowing anyone in the world to query the > biomoby.org DNS zone. Can the moby developers/leaders elect a point > person that I can remain in contact with while we do this work? Eddie Kawas: ed.kawas at gmail.com I'm in the lab until tomorrow, and then away for about 10 days in Germany, so he's the one who will answer your questions most rapidly. > I > want to make sure that we don't affect/break moby services while this > work is done. :-) thanks Chris! Best wishes, M -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From markw at illuminae.com Wed Nov 29 17:31:11 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 09:31:11 -0800 Subject: [MOBY-dev] [moby] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> Message-ID: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> Hi Andreas! Thanks for taking the time to put this document together. Using the RSS feed is an interesting idea. My first instinct is that it might not be "robust" enough, but I suppose if we spent more time thinking about what information is passed on that RSS feed it might work quite well! Have you considered taking advantage of the recent move towards distributed service signatures? The RDF Agent is capable of consuming a list of URLs, recovering the RDF signatures from those URLs, and rebuilding the entire registry from those RDF documents. It is also a simple API call to MOBY Central that generates the list of URLs representing all of the service signatures. As such, a full mirroring operation should require nothing more than a single call to the primary MOBY Central, and passing the result of that call to the RDF agent of the mirror site and letting it run... Eddie, correct me if that isn't true... I'm going to be at your institute this time next week, so let's talk about it more in person :-) Best wishes! Mark On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: > The following text describes the procedure of the synchronization of > Biomoby secondary repositories. > > Aim: Replicate BioMoby central > -to create mirrors > -to have redundancy in case of failure > -to create private sets of services, either filtered from the global > set (less services) or added to the global set (more services) > > Problems: > -synchronizing repositories > -cascading service/object registration requests > -populating a Moby central from scratch > > Solutions: > -The existing RSS feed is used to notify secondaries of changes > (register service/delete service/update service) to the master > -A complete RSS document is created by a new dump method for > initialization of Moby centrals from scratch > -Registrations are handled by the client and NOT cascaded > > 1. Synchronizing repositories > ============================= > > We propose that secondaries check the Biomoby RSS feed to be > notified whether changes in the registration have been done. > Currently the RSS feed is updated once a day, for more rapid > synchronization this would have to be changed. > The changes include registration, modification or deletion of a > service/object. If changes were applied to the Biomoby Central > registry the changes are adopted to the secondary. > The RSS contains the signature URL where the secondary picks up > the service RDF to retrieve all details required for the > registration using the existing RDF agent. > > i) Problems/changes required: > > The main question here is if unregistered services are deleted > completely from the central database or are marked as inactive. The > problem about that is, that the feed would need to contain also the > information of a deleted service, so that the secondaries will > retrieve that information. So Moby central will have to keep a full > transaction log also of deletions. > > 2. Filtering > ============ > > We propose that any secondary can apply filters to the RSS feed and > thus only include a subset of all services/objects. This can be > useful to make finding services from lists easier, to tune workflows > to performant services, only use local services or to exclude test > services. Information relevant to filtering is in the RSS, like > authority, description, but maybe more will be relevant, then > filtering may need to happen at the level of service RDF. > > 3. Private services > =================== > > We propose that any client can register services with a Moby central > secondary, these will then be available only to clients querying the > secondary. If the secondary is in a local network, this allows easy > access control to local services. Any secondary synchronizing to that > repository will of course inherit all those additional services, > allowing simple creation of local production Moby centrals and local > test Moby centrals. > > 4. Registration > =============== > > We propose to NOT cascade registration requests, i.e. pass them on > from secondary to master. That means that the client has control over > where a registration is done but also means the client has to make > that choice. Registration clients must thus add an implementation > that allows a user to choose the Moby central where a service/object > should be registered. Registration always happens at the topmost Moby > central node where the service should be visible, all secondaries of > this Moby central will pick that service up by synchronization. > > Why? Cascading registration is cumbersome, as only once a > registration request has reached the topmost node can name > duplications etc. be resolved, which must then be passed to the client. > > Name conflicts can still occur with locally registered services. > E.g., Adam registers a private service AnalyseThis on a private > secondary. Later, Beth registers AnalyseThis with same authority on > the Moby central master. The private secondary picks this up from the > RSS and runs into a name duplication. Proposed solution: Local > registrations MUST ALWAYS use a local authority. E.g., Adam registers > AnalyseThis with authority InternalIP, and Beth registers AnalyseThis > with authority paul_vitti.com. Then, we assume whoever registers a > service at a more global Moby central knows what we're doing and give > synchronization precedence over local registrations. E.g., a test > registry is a secondary of Moby central. Chris registers AnalyseThat > with authority paul_vitti.com in the test registry. Once he's happy > with testing, he registers AnalyseThat with authority paul_vitti.com > in Moby central. The test registry retrieves this from the RSS, > discards the local registration and overwrites it with the > registration picked up through the RSS. > > 5. Moby central failure > ======================= > > If a master Moby central fails, the secondaries continue normal > operation with no effect on service discovery for all clients keyed > to a secondary. However, registration is no longer possible at the > master node. Once the master node comes back up, all secondaries must > resync. > > 6. Adaptations to the RSS > ========================= > > For this procedure the current RSS feed has to be changed marginally, to > enable on the one hand the correct notification of the secondaries, > on the other hand to ensure that the normal RSS reader still work the > usual way. The current RSS feed mainly uses the Dublin Core Metadata > to provide the information, so to add additional information to the > feed it is only needed to add more Dublin Core Metadata. > > Primarily the feed has to contain the information whether the service > is new, modified or deleted. Additionally the service rdf has to be > linked in the feed to enable the local RDF agent to apply the changes > with the information of the service rdf to the local secondary. > If other additional information shall be added to the feed to provide > more possibilities to filter the services can be discussed. > > 7. Resync > ========= > > Another main aspect is the problem if a repository is out of sync > (e.g. due to a temporary failure of master or secondary). The RSS > feed has a limited length, which means a limited number of > transactions are contained. Possibly, this will mean it does not > contain all transactions since the last sync of a secondary. > > > i) Solution > We propose that each repository will store a time stamp of > the last synchronization. In case that > in the next synchronization process the oldest changes in the feed > are older than the current sync time stamp of the repository, > we run the risk to not receive all information > about service changes. In this case the secondary should be able to > ask the primary to create a RSS feed with all changes which have > happened since the current time stamp of the secondary. > > 8. Initial load > =============== > > When populating a new secondary from scratch, all registered services/ > objects need to be received from the master Moby central. We propose > a new method in Moby central to request all registered services/ > objects as RSS. Then, the initialization proceeds exactly like a > synchronization. > > > > So to kick off the discussion here are some of our questions: > > 1.Is it reasonable to use the existing RSS feed for this procedure ? > It sounds very handy and avoids creating a similar complete new structure > > 2.Does any structure keep track of deleted services ? > > 3.Resync: Is it reasonable to timestamp all transactions in Moby > central? Or should we solve the resync issue by enforcing a full drop/ > emptying of the secondary and reload all data as in initial load? > > > Thanks > Heiko & Andreas > > -- > Andreas Groscurth > Diplom Bioinformatik - PhD Student > Max Planck Institute for Plant Breeding Research > Carl-von-Linn?-Weg 10 > 50829 Cologne > Germany > E-mail: groscurt at mpiz-koeln.mpg.de > Phone: +49(0)221-5062-447 > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark Wilkinson Asst. Professor, Dept. of Medical Genetics University of British Columbia PI in Bioinformatics, iCAPTURE Centre St. Paul's Hospital, Rm. 166, 1081 Burrard St. Vancouver, BC, V6Z 1Y6 tel: 604 682 2344 x62129 fax: 604 806 9274 "Scientists would rather share their toothbrush than their data" - Carole Goble ================== ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From ed.kawas at gmail.com Wed Nov 29 18:03:48 2006 From: ed.kawas at gmail.com (Ed Kawas) Date: Wed, 29 Nov 2006 10:03:48 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook> > We run a cron'd script from our server here that tests all > services in the registry every hour. I don't know for > certain if this is using LSID resolution as part of that task > (Eddie, can you confirm?), but it wouldn't surprise me if > that were the case. It does not. Pure api (findservice, etc). > > > > (2) How popular is LSID? Could services making use of the > 'lsid' SVR > > record be responsible for lots of DNS traffic? LIke 256+ sessions > > from the same IP? > > > We are increasingly using the LSID to represent *all* > entities in MOBY - datatypes, service types, web service > instances, etc. A tool like Taverna may well be resolving > all LSIDs in the MOBY registry each time it starts-up (?), > which could account for the traffic. Other client > applications will likely use LSID resolution in the same way > in the near future, if they don't already. Again, the IP > address would fairly quickly tell us whether these are > "scientists" or "scriptkiddies". > > Regardless, the use of LSIDs in MOBY is only going to > increase over time, so if it is becoming an issue now we > should think about how to manage it before it becomes a real > problem... > Mark, your gbrowse_moby application uses lsids a lot. However, those requests would be from a single ip address. Eddie From markw at illuminae.com Wed Nov 29 18:34:47 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 10:34:47 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook> References: <002f01c713e0$b9e393d0$6900a8c0@notebook> Message-ID: On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas wrote: > Mark, your gbrowse_moby application uses lsids a lot. However, those > requests > would be from a single ip address. Right... but I don't think it creates 256+ requests at a time, since it is a low-throughput interface... I'd be surprised if gbrowse moby was the culprit here. M From markw at illuminae.com Wed Nov 29 18:34:47 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 10:34:47 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <002f01c713e0$b9e393d0$6900a8c0@notebook> References: <002f01c713e0$b9e393d0$6900a8c0@notebook> Message-ID: On Wed, 29 Nov 2006 10:03:48 -0800, Ed Kawas wrote: > Mark, your gbrowse_moby application uses lsids a lot. However, those > requests > would be from a single ip address. Right... but I don't think it creates 256+ requests at a time, since it is a low-throughput interface... I'd be surprised if gbrowse moby was the culprit here. M From ed.kawas at gmail.com Wed Nov 29 18:03:48 2006 From: ed.kawas at gmail.com (Ed Kawas) Date: Wed, 29 Nov 2006 10:03:48 -0800 Subject: [MOBY-dev] [moby] question for moby devs/architects regardinguse of DNS In-Reply-To: <1164822100.20382.13.camel@bioinfo.icapture.ubc.ca> Message-ID: <002f01c713e0$b9e393d0$6900a8c0@notebook> > We run a cron'd script from our server here that tests all > services in the registry every hour. I don't know for > certain if this is using LSID resolution as part of that task > (Eddie, can you confirm?), but it wouldn't surprise me if > that were the case. It does not. Pure api (findservice, etc). > > > > (2) How popular is LSID? Could services making use of the > 'lsid' SVR > > record be responsible for lots of DNS traffic? LIke 256+ sessions > > from the same IP? > > > We are increasingly using the LSID to represent *all* > entities in MOBY - datatypes, service types, web service > instances, etc. A tool like Taverna may well be resolving > all LSIDs in the MOBY registry each time it starts-up (?), > which could account for the traffic. Other client > applications will likely use LSID resolution in the same way > in the near future, if they don't already. Again, the IP > address would fairly quickly tell us whether these are > "scientists" or "scriptkiddies". > > Regardless, the use of LSIDs in MOBY is only going to > increase over time, so if it is becoming an issue now we > should think about how to manage it before it becomes a real > problem... > Mark, your gbrowse_moby application uses lsids a lot. However, those requests would be from a single ip address. Eddie From edward.kawas at gmail.com Wed Nov 29 15:22:14 2006 From: edward.kawas at gmail.com (Edward Kawas) Date: Wed, 29 Nov 2006 07:22:14 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> Message-ID: <001f01c713ca$27188b70$6900a8c0@notebook> Hi, >From reading *just* the 'aim' and 'problems' portion of this message, I was wondering whether you thought about using the agent for mirroring. Just throwing it out there, Eddie > -----Original Message----- > From: moby-dev-bounces at lists.open-bio.org > [mailto:moby-dev-bounces at lists.open-bio.org] On Behalf Of > Andreas Groscurth > Sent: Wednesday, November 29, 2006 4:02 AM > To: moby-dev at lists.open-bio.org > Subject: [MOBY-dev] RFC - Synchronization of Biomoby > secondary repositories > > The following text describes the procedure of the > synchronization of Biomoby secondary repositories. > > Aim: Replicate BioMoby central > -to create mirrors > -to have redundancy in case of failure > -to create private sets of services, either filtered from the > global set (less services) or added to the global set (more services) > > Problems: > -synchronizing repositories > -cascading service/object registration requests -populating a > Moby central from scratch > > Solutions: > -The existing RSS feed is used to notify secondaries of > changes (register service/delete service/update service) to > the master -A complete RSS document is created by a new dump > method for initialization of Moby centrals from scratch > -Registrations are handled by the client and NOT cascaded > > 1. Synchronizing repositories > ============================= > > We propose that secondaries check the Biomoby RSS feed to be > notified whether changes in the registration have been done. > Currently the RSS feed is updated once a day, for more rapid > synchronization this would have to be changed. > The changes include registration, modification or deletion of > a service/object. If changes were applied to the Biomoby > Central registry the changes are adopted to the secondary. > The RSS contains the signature URL where the secondary picks > up the service RDF to retrieve all details required for the > registration using the existing RDF agent. > > i) Problems/changes required: > > The main question here is if unregistered services are > deleted completely from the central database or are marked as > inactive. The problem about that is, that the feed would need > to contain also the information of a deleted service, so that > the secondaries will retrieve that information. So Moby > central will have to keep a full transaction log also of deletions. > > 2. Filtering > ============ > > We propose that any secondary can apply filters to the RSS > feed and thus only include a subset of all services/objects. > This can be useful to make finding services from lists > easier, to tune workflows to performant services, only use > local services or to exclude test services. Information > relevant to filtering is in the RSS, like authority, > description, but maybe more will be relevant, then filtering > may need to happen at the level of service RDF. > > 3. Private services > =================== > > We propose that any client can register services with a Moby > central secondary, these will then be available only to > clients querying the secondary. If the secondary is in a > local network, this allows easy access control to local > services. Any secondary synchronizing to that repository will > of course inherit all those additional services, allowing > simple creation of local production Moby centrals and local > test Moby centrals. > > 4. Registration > =============== > > We propose to NOT cascade registration requests, i.e. pass > them on from secondary to master. That means that the client > has control over where a registration is done but also means > the client has to make that choice. Registration clients must > thus add an implementation that allows a user to choose the > Moby central where a service/object should be registered. > Registration always happens at the topmost Moby central node > where the service should be visible, all secondaries of this > Moby central will pick that service up by synchronization. > > Why? Cascading registration is cumbersome, as only once a > registration request has reached the topmost node can name > duplications etc. be resolved, which must then be passed to > the client. > > Name conflicts can still occur with locally registered services. > E.g., Adam registers a private service AnalyseThis on a > private secondary. Later, Beth registers AnalyseThis with > same authority on the Moby central master. The private > secondary picks this up from the RSS and runs into a name > duplication. Proposed solution: Local registrations MUST > ALWAYS use a local authority. E.g., Adam registers > AnalyseThis with authority InternalIP, and Beth registers > AnalyseThis with authority paul_vitti.com. Then, we assume > whoever registers a service at a more global Moby central > knows what we're doing and give synchronization precedence > over local registrations. E.g., a test registry is a > secondary of Moby central. Chris registers AnalyseThat with > authority paul_vitti.com in the test registry. Once he's > happy with testing, he registers AnalyseThat with authority > paul_vitti.com in Moby central. The test registry retrieves > this from the RSS, discards the local registration and > overwrites it with the registration picked up through the RSS. > > 5. Moby central failure > ======================= > > If a master Moby central fails, the secondaries continue > normal operation with no effect on service discovery for all > clients keyed to a secondary. However, registration is no > longer possible at the master node. Once the master node > comes back up, all secondaries must resync. > > 6. Adaptations to the RSS > ========================= > > For this procedure the current RSS feed has to be changed > marginally, to enable on the one hand the correct > notification of the secondaries, on the other hand to ensure > that the normal RSS reader still work the usual way. The > current RSS feed mainly uses the Dublin Core Metadata to > provide the information, so to add additional information to > the feed it is only needed to add more Dublin Core Metadata. > > Primarily the feed has to contain the information whether the > service is new, modified or deleted. Additionally the service > rdf has to be linked in the feed to enable the local RDF > agent to apply the changes with the information of the > service rdf to the local secondary. > If other additional information shall be added to the feed to > provide more possibilities to filter the services can be discussed. > > 7. Resync > ========= > > Another main aspect is the problem if a repository is out of > sync (e.g. due to a temporary failure of master or > secondary). The RSS feed has a limited length, which means a > limited number of transactions are contained. Possibly, this > will mean it does not contain all transactions since the last > sync of a secondary. > > > i) Solution > We propose that each repository will store a time stamp of > the last synchronization. In case that in the next > synchronization process the oldest changes in the feed are > older than the current sync time stamp of the repository, we > run the risk to not receive all information about service > changes. In this case the secondary should be able to ask the > primary to create a RSS feed with all changes which have > happened since the current time stamp of the secondary. > > 8. Initial load > =============== > > When populating a new secondary from scratch, all registered > services/ objects need to be received from the master Moby > central. We propose a new method in Moby central to request > all registered services/ objects as RSS. Then, the > initialization proceeds exactly like a synchronization. > > > > So to kick off the discussion here are some of our questions: > > 1.Is it reasonable to use the existing RSS feed for this procedure ? > It sounds very handy and avoids creating a similar complete > new structure > > 2.Does any structure keep track of deleted services ? > > 3.Resync: Is it reasonable to timestamp all transactions in > Moby central? Or should we solve the resync issue by > enforcing a full drop/ emptying of the secondary and reload > all data as in initial load? > > > Thanks > Heiko & Andreas > > -- > Andreas Groscurth > Diplom Bioinformatik - PhD Student > Max Planck Institute for Plant Breeding Research Carl-von-Linn?-Weg 10 > 50829 Cologne > Germany > E-mail: ? ?groscurt at mpiz-koeln.mpg.de > Phone: ? ?+49(0)221-5062-447 > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev From markw at illuminae.com Wed Nov 29 23:52:23 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Wed, 29 Nov 2006 15:52:23 -0800 Subject: [MOBY-dev] Holy cow! Lotsa hits! Message-ID: 852,000 hits on MOBY Central in November. That's a new record :-) M -- -- Mark Wilkinson Assistant Professor, Dept. Medical Genetics University of British Columbia PI Bioinformatics iCAPTURE Centre, St. Paul's Hospital From schoof at mpiz-koeln.mpg.de Thu Nov 30 09:39:46 2006 From: schoof at mpiz-koeln.mpg.de (Heiko Schoof) Date: Thu, 30 Nov 2006 10:39:46 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> Message-ID: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Hi Mark, Eddie, we already use the RDF agent, from the RSS we intend to pull mainly the signature URLs, then we propose to use the RDF agent to get all data. ---quote--- The RSS contains the signature URL where the secondary picks up the service RDF to retrieve all details required for the registration using the existing RDF agent. ---/quote--- The advantage of the RSS versus the API call to retrieve ALL signature URLs is: -scalability: If there are 1000s of signature URLs...with the RSS, we only retrieve changes -filtering: ability to filter already based on data in the RSS with no need to actually retrieve the service RDF; should improve filtering performance as it's one request instead of potentially hundreds plus the need to parse all those RDF. However, for the initialization/from scratch, this method indeed makes most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI where we can post that? Do you intend to come "work" at the MPIZ next week? If yes, when? I'm free Thursday afternoon and most of Friday. Best, Heiko On 29. Nov 2006, at 18:31 Uhr, Mark Wilkinson wrote: > Hi Andreas! > > Thanks for taking the time to put this document together. Using > the RSS > feed is an interesting idea. My first instinct is that it might > not be > "robust" enough, but I suppose if we spent more time thinking about > what > information is passed on that RSS feed it might work quite well! > > Have you considered taking advantage of the recent move towards > distributed service signatures? The RDF Agent is capable of > consuming a > list of URLs, recovering the RDF signatures from those URLs, and > rebuilding the entire registry from those RDF documents. It is also a > simple API call to MOBY Central that generates the list of URLs > representing all of the service signatures. As such, a full mirroring > operation should require nothing more than a single call to the > primary > MOBY Central, and passing the result of that call to the RDF agent of > the mirror site and letting it run... Eddie, correct me if that isn't > true... > > I'm going to be at your institute this time next week, so let's talk > about it more in person :-) > > Best wishes! > > Mark > > > > On Wed, 2006-11-29 at 13:02 +0100, Andreas Groscurth wrote: >> The following text describes the procedure of the synchronization of >> Biomoby secondary repositories. >> >> Aim: Replicate BioMoby central >> -to create mirrors >> -to have redundancy in case of failure >> -to create private sets of services, either filtered from the global >> set (less services) or added to the global set (more services) >> >> Problems: >> -synchronizing repositories >> -cascading service/object registration requests >> -populating a Moby central from scratch >> >> Solutions: >> -The existing RSS feed is used to notify secondaries of changes >> (register service/delete service/update service) to the master >> -A complete RSS document is created by a new dump method for >> initialization of Moby centrals from scratch >> -Registrations are handled by the client and NOT cascaded >> >> 1. Synchronizing repositories >> ============================= >> >> We propose that secondaries check the Biomoby RSS feed to be >> notified whether changes in the registration have been done. >> Currently the RSS feed is updated once a day, for more rapid >> synchronization this would have to be changed. >> The changes include registration, modification or deletion of a >> service/object. If changes were applied to the Biomoby Central >> registry the changes are adopted to the secondary. >> The RSS contains the signature URL where the secondary picks up >> the service RDF to retrieve all details required for the >> registration using the existing RDF agent. >> >> i) Problems/changes required: >> >> The main question here is if unregistered services are deleted >> completely from the central database or are marked as inactive. The >> problem about that is, that the feed would need to contain also the >> information of a deleted service, so that the secondaries will >> retrieve that information. So Moby central will have to keep a full >> transaction log also of deletions. >> >> 2. Filtering >> ============ >> >> We propose that any secondary can apply filters to the RSS feed and >> thus only include a subset of all services/objects. This can be >> useful to make finding services from lists easier, to tune workflows >> to performant services, only use local services or to exclude test >> services. Information relevant to filtering is in the RSS, like >> authority, description, but maybe more will be relevant, then >> filtering may need to happen at the level of service RDF. >> >> 3. Private services >> =================== >> >> We propose that any client can register services with a Moby central >> secondary, these will then be available only to clients querying the >> secondary. If the secondary is in a local network, this allows easy >> access control to local services. Any secondary synchronizing to that >> repository will of course inherit all those additional services, >> allowing simple creation of local production Moby centrals and local >> test Moby centrals. >> >> 4. Registration >> =============== >> >> We propose to NOT cascade registration requests, i.e. pass them on >> from secondary to master. That means that the client has control over >> where a registration is done but also means the client has to make >> that choice. Registration clients must thus add an implementation >> that allows a user to choose the Moby central where a service/object >> should be registered. Registration always happens at the topmost Moby >> central node where the service should be visible, all secondaries of >> this Moby central will pick that service up by synchronization. >> >> Why? Cascading registration is cumbersome, as only once a >> registration request has reached the topmost node can name >> duplications etc. be resolved, which must then be passed to the >> client. >> >> Name conflicts can still occur with locally registered services. >> E.g., Adam registers a private service AnalyseThis on a private >> secondary. Later, Beth registers AnalyseThis with same authority on >> the Moby central master. The private secondary picks this up from the >> RSS and runs into a name duplication. Proposed solution: Local >> registrations MUST ALWAYS use a local authority. E.g., Adam registers >> AnalyseThis with authority InternalIP, and Beth registers AnalyseThis >> with authority paul_vitti.com. Then, we assume whoever registers a >> service at a more global Moby central knows what we're doing and give >> synchronization precedence over local registrations. E.g., a test >> registry is a secondary of Moby central. Chris registers AnalyseThat >> with authority paul_vitti.com in the test registry. Once he's happy >> with testing, he registers AnalyseThat with authority paul_vitti.com >> in Moby central. The test registry retrieves this from the RSS, >> discards the local registration and overwrites it with the >> registration picked up through the RSS. >> >> 5. Moby central failure >> ======================= >> >> If a master Moby central fails, the secondaries continue normal >> operation with no effect on service discovery for all clients keyed >> to a secondary. However, registration is no longer possible at the >> master node. Once the master node comes back up, all secondaries must >> resync. >> >> 6. Adaptations to the RSS >> ========================= >> >> For this procedure the current RSS feed has to be changed >> marginally, to >> enable on the one hand the correct notification of the secondaries, >> on the other hand to ensure that the normal RSS reader still work the >> usual way. The current RSS feed mainly uses the Dublin Core Metadata >> to provide the information, so to add additional information to the >> feed it is only needed to add more Dublin Core Metadata. >> >> Primarily the feed has to contain the information whether the service >> is new, modified or deleted. Additionally the service rdf has to be >> linked in the feed to enable the local RDF agent to apply the changes >> with the information of the service rdf to the local secondary. >> If other additional information shall be added to the feed to provide >> more possibilities to filter the services can be discussed. >> >> 7. Resync >> ========= >> >> Another main aspect is the problem if a repository is out of sync >> (e.g. due to a temporary failure of master or secondary). The RSS >> feed has a limited length, which means a limited number of >> transactions are contained. Possibly, this will mean it does not >> contain all transactions since the last sync of a secondary. >> >> >> i) Solution >> We propose that each repository will store a time stamp of >> the last synchronization. In case that >> in the next synchronization process the oldest changes in the feed >> are older than the current sync time stamp of the repository, >> we run the risk to not receive all information >> about service changes. In this case the secondary should be able to >> ask the primary to create a RSS feed with all changes which have >> happened since the current time stamp of the secondary. >> >> 8. Initial load >> =============== >> >> When populating a new secondary from scratch, all registered >> services/ >> objects need to be received from the master Moby central. We propose >> a new method in Moby central to request all registered services/ >> objects as RSS. Then, the initialization proceeds exactly like a >> synchronization. >> >> >> >> So to kick off the discussion here are some of our questions: >> >> 1.Is it reasonable to use the existing RSS feed for this procedure ? >> It sounds very handy and avoids creating a similar complete new >> structure >> >> 2.Does any structure keep track of deleted services ? >> >> 3.Resync: Is it reasonable to timestamp all transactions in Moby >> central? Or should we solve the resync issue by enforcing a full >> drop/ >> emptying of the secondary and reload all data as in initial load? >> >> >> Thanks >> Heiko & Andreas >> >> -- >> Andreas Groscurth >> Diplom Bioinformatik - PhD Student >> Max Planck Institute for Plant Breeding Research >> Carl-von-Linn?-Weg 10 >> 50829 Cologne >> Germany >> E-mail: groscurt at mpiz-koeln.mpg.de >> Phone: +49(0)221-5062-447 >> >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > -- > Mark Wilkinson > Asst. Professor, Dept. of Medical Genetics > University of British Columbia > PI in Bioinformatics, iCAPTURE Centre > St. Paul's Hospital, Rm. 166, 1081 Burrard St. > Vancouver, BC, V6Z 1Y6 > tel: 604 682 2344 x62129 > fax: 604 806 9274 > > "Scientists would rather share their toothbrush than their data" > - Carole Goble > > ================== > > > ***CONFIDENTIALITY NOTICE*** > This electronic message is intended only for the use of the addressee > and may contain information that is privileged and confidential. Any > dissemination, distribution or copying of this communication by > unauthorized individuals is strictly prohibited. If you have received > this communication in error, please notify the sender immediately by > reply e-mail and delete the original and all copies from your system. > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev From dgonzalez at cnio.es Thu Nov 30 14:43:20 2006 From: dgonzalez at cnio.es (David G. Pisano) Date: Thu, 30 Nov 2006 15:43:20 +0100 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: References: Message-ID: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> Mark, Do you have historic records? It would be nice for presentations ;) (and grats to everybody, by the way) David On 30 Nov, 2006, at 12:52 AM, Mark Wilkinson wrote: > 852,000 hits on MOBY Central in November. That's a new record :-) > > M > > > -- > -- > Mark Wilkinson > Assistant Professor, Dept. Medical Genetics > University of British Columbia > PI Bioinformatics > iCAPTURE Centre, St. Paul's Hospital > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. From markw at illuminae.com Thu Nov 30 16:40:28 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 30 Nov 2006 08:40:28 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Message-ID: > we already use the RDF agent, from the RSS we intend to pull mainly the > signature URLs, then we propose to use the RDF agent to get all data. Ah! I see. > The advantage of the RSS versus the API call to retrieve ALL signature > URLs is: > -scalability: If there are 1000s of signature URLs...with the RSS, we > only retrieve changes I would need to modify the RSS feed such that it reports additions *and* removals (right now it is just additions), and we would have to come up with a formal way of representing these... and then the RSS functionality and features would need to become a formal part of the MOBY API (I can just hear Dr. Senger yelling at us right now that we're considering building core functionality on parts of MOBY that are entirely undocumented ;-) ;-) ). I guess that's why I am so hesitant to use RSS. However, I guess so long as this is not a "recommended" practice, only a short-cut; and as long as it is *always* possible to use a true API call to mirror the registry, and we formally say what the recommended best-practice is, then it's reasonable to have a non-guaranteed alternative that is more lightweight. > However, for the initialization/from scratch, this method indeed makes > most sense, we'll modify the RFC accordingly. Is there a Biomoby WIKI > where we can post that? There currently is no Wiki running on BioMoby. The last Wiki we had was hacked, and we just never brought it back up again. We tried using Bugzilla as a way of tracking RFC's, but that didn't make many people very happy, so we're simply using the mailing list, with some formal write-up in an attachment. > Do you intend to come "work" at the MPIZ next week? Do I have to answer that publicly? ;-) The answer is "Yes". We should get a variety of things - MOBY-wise and otherwise - sorted out between us while I am there. I'm free all day Thursday and Friday, so that should give us plenty of time. Cheers! M From schoof at mpiz-koeln.mpg.de Thu Nov 30 18:01:48 2006 From: schoof at mpiz-koeln.mpg.de (Heiko Schoof) Date: Thu, 30 Nov 2006 19:01:48 +0100 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> Message-ID: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> On 30. Nov 2006, at 17:40 Uhr, Mark Wilkinson wrote: > and then the RSS functionality and features would need to become a > formal part of the MOBY API (I can just hear Dr. Senger yelling at > us right now that we're considering building core functionality on > parts of MOBY that are entirely undocumented ;-) ;-) ). I guess > that's why I am so hesitant to use RSS. > > However, I guess so long as this is not a "recommended" practice, > only a short-cut; and as long as it is *always* possible to use a > true API call to mirror the registry, and we formally say what the > recommended best-practice is, then it's reasonable to have a non- > guaranteed alternative that is more lightweight. > What we are proposing is to make new RSS functionality that will be part of the core API. It just so happens that RSS and the surrounding toolkit is well suited to the purpose, and more fitting to Moby than other solutions we've looked at. Why should we make a new API call that spews out some custom XML if we can perfectly use RSS within its specs and get a core RSS feed for "human"/aggregator consumption at the same time for free? We stated that we will need to modify the RSS, though not breaking anything as far as we can see. We were not proposing to use RSS just because there's existing functionality ;-) we're not quite THAT lazy...though almost. And... isn't it *cool* to use RSS for some real work? What is a true API call? Why is a call to the RSS feed not a true API call, if we make it part of the API? RSS is a tested, scaleable technology, which is why we propose to use it, as we envision hundreds of Moby clients maintaining their local cache (like Dashboard) through that functionality. Which is one thing we haven't mentioned yet, caching of Moby central for clients could easily build on the Moby secondary functionality, we think. But maybe Martin or others with experience on Moby central caching should comment on that. Heiko From markw at illuminae.com Thu Nov 30 19:10:08 2006 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 30 Nov 2006 11:10:08 -0800 Subject: [MOBY-dev] RFC - Synchronization of Biomoby secondary repositories In-Reply-To: <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> References: <200611291302.05822.groscurt@mpiz-koeln.mpg.de> <1164821471.20382.2.camel@bioinfo.icapture.ubc.ca> <6A3CBE48-61D9-4C11-AA18-09DD62C1527B@mpiz-koeln.mpg.de> <93B9B57A-9F8D-486C-92EF-F3EBC5B6A3AC@mpiz-koeln.mpg.de> Message-ID: Hi Heiko, > Why should we make a new API call that spews out some custom XML if we > can perfectly use RSS within its specs and get a core RSS feed for > "human"/aggregator consumption at the same time for free? We stated that > we will need to modify the RSS, though not breaking anything as far as > we can see. I think we are effectively saying the same thing; I am not suggesting that we make a new API call, I'm suggesting that there are API call's that exist already that could be used for this purpose, albeit not so conveniently as your RSS suggestion. > We were not proposing to use RSS just because there's existing > functionality ;-) we're not quite THAT lazy...though almost. And... > isn't it *cool* to use RSS for some real work? Well... I guess this is the issue. You're proposing to use RSS for a purpose for which it was not (IMO) designed. As such, we would have to create new conventions around the RSS feed (hereafter called MOBY-RSS) that may or may not be more widely accepted in the world. I agree 100% that it would be VERY cool to use RSS in this way, but v.v. a robust solution to the problem I'm not entirely convinced. The amount of RSS-RDF we would have to maintain on MOBY Central in order to have a complete history that would allow a mirror to reliably re-construct the current state of the database is... well... large! At the moment, I keep only the last... 100?... changes. If you don't pick-up the feed for a day, or if someone registers 1000 new services, you wont see them in the feed. To be safe, we would have to keep *all* changes in the RDF document at MOBY Central, in which case the overhead of calling the feed versus using the MOBY Central API would be about the same. I'm not *opposed* to the idea of using RSS, and I agree that it is a novel and "cool" use for it, but I am concerned that we will perpetuate the MOBY reputation of making ad hoc decisions around other standards... (which isn't necessarily BAD, it just gives us a reputation for being maveriks, which angers the reviewers :-) ) Let's talk about it over a Koelsch (or two) next week! M -- -- Mark Wilkinson Assistant Professor, Dept. Medical Genetics University of British Columbia PI Bioinformatics iCAPTURE Centre, St. Paul's Hospital ***CONFIDENTIALITY NOTICE*** This electronic message is intended only for the use of the addressee and may contain information that is privileged and confidential. Any dissemination, distribution or copying of this communication by unauthorized individuals is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and delete the original and all copies from your system. From martin.senger at gmail.com Thu Nov 30 15:08:37 2006 From: martin.senger at gmail.com (Martin Senger) Date: Thu, 30 Nov 2006 15:08:37 +0000 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> Message-ID: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> Well, let me be more realistic. I do not like so called "Potemkin's" villages (see google if it does not ring any bell). Of course, I am glad that BioMoby is growing, and I will support it in any possible way. And I also understand that some facts are good for PR and for funding agencies. But... For us, we should be more precise what these hits actually mean. For example, I assume (please correct me if I am wrong) that every time somebody updates her local cash from Dashboard, it increases the hit numbers. Also, all these automated tools may influence how many times the registry is accessed. Which gives us a distorted picture. Better, possibly, would be to agree to an HTTP agent name (or names) that we can use in this automatic tools - and to separate in the statistics *all* hits (good for funding agencies) from the other hits where we do not include requests from this (or these) HTTP agent(s). Just my "c's, Martin -- Martin Senger email: martin.senger at gmail.com skype: martinsenger From jmfernandez at cnio.es Thu Nov 30 21:06:53 2006 From: jmfernandez at cnio.es (=?ISO-8859-1?Q?Jos=E9_Mar=EDa_Fern=E1ndez_Gonz=E1lez?=) Date: Thu, 30 Nov 2006 22:06:53 +0100 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> Message-ID: <456F47ED.1060800@cnio.es> You should have some way to distinguish automated tools entries from the other (with some help from the tool developers, of course). As every request is done through HTTP, you could use User Agent signature recorded in Apache logs. For instance, each tool could use a different 'User Agent' variant, so they could be distinguished, or if a program/tool is going can issue requests related to maintenance, it should be advisable to alter its 'User Agent' signature in some way based on their mode. Just my 2 euro-cents. Jos? Mar?a Martin Senger wrote: > Well, let me be more realistic. I do not like so called "Potemkin's" > villages (see google if it does not ring any bell). Of course, I am glad > that BioMoby is growing, and I will support it in any possible way. And I > also understand that some facts are good for PR and for funding agencies. > > But... For us, we should be more precise what these hits actually mean. For > example, I assume (please correct me if I am wrong) that every time somebody > updates her local cash from Dashboard, it increases the hit numbers. Also, > all these automated tools may influence how many times the registry is > accessed. Which gives us a distorted picture. > > Better, possibly, would be to agree to an HTTP agent name (or names) that we > can use in this automatic tools - and to separate in the statistics *all* > hits (good for funding agencies) from the other hits where we do not include > requests from this (or these) HTTP agent(s). > > Just my "c's, > Martin > -- Jos? Mar?a Fern?ndez Gonz?lez Tlfn: (+34) 91 732 80 00 / 91 224 69 00 (ext 2256) e-mail: jmfernandez at cnio.es Fax: (+34) 91 224 69 76 Biolog?a Estructural y Bioinform?tica Structural Biology and Bioinformatics Centro Nacional de Investigaciones Oncol?gicas C.P.: 28029 Zip Code: 28029 C/. Melchor Fern?ndez Almagro, 3 Madrid (Spain) **NOTA DE CONFIDENCIALIDAD** Este correo electr?nico, y en su caso los ficheros adjuntos, pueden contener informaci?n protegida para el uso exclusivo de su destinatario. Se proh?be la distribuci?n, reproducci?n o cualquier otro tipo de transmisi?n por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. From martin.senger at gmail.com Thu Nov 30 21:48:08 2006 From: martin.senger at gmail.com (Martin Senger) Date: Thu, 30 Nov 2006 21:48:08 +0000 Subject: [MOBY-dev] Holy cow! Lotsa hits! In-Reply-To: <456F47ED.1060800@cnio.es> References: <9D79EB83-1E3D-4981-913D-C79BF7A2A205@cnio.es> <4d93f07c0611300708g1334a3b2g8acb611120c2245d@mail.gmail.com> <456F47ED.1060800@cnio.es> Message-ID: <4d93f07c0611301348v1289a6a3u13139a49a6d55678@mail.gmail.com> > you could use User Agent signature Well, that's what I said :-) Martin -- Martin Senger email: martin.senger at gmail.com skype: martinsenger