[DAS] LDAS vs Dazzle

Lincoln Stein lstein@cshl.org
Fri, 4 Jan 2002 11:48:54 -0500


Hi All,

John is pointing out the inevitable tradeoff between simplicity and power.  
LDAS may be easier to install (that's why it's called "lightweight") but it 
won't do as much or be as flexible after it's installed as will Dazzle.

The issue of supporting multiple assemblies is a red herring.  The DAS/1 
protocol does not deal well with multiple assemblies.  The best you can do is 
to have multiple reference sources (running out of the same server if you 
want), that each use a different assembly.  You can do this with either LDAS 
or Dazzle, and I doubt that it's much harder with one than the other.

I'm working on an easy-to-install web-based DAS client now, based on Allen's 
work with DasView and Foo's work at TIGR.  Hopefully I'll have an LDASView to 
report by the ORA meeting, but with so many other things on my plate, who 
knows?  Anybody who'd like to work with me on this, please step up to the 
plate.  I know pretty well what needs to be done.

Lincoln

On Friday 04 January 2002 11:26, jfreeman wrote:
> I agree with David, and would like to add the following commentary:
>
> First, he true cost of open-source GPL'ed software is the time it takes
> to find and acquire the proper documentation to work with it.  A badly
> documented piece of open source code is worthless if your time is not
> free or considered free to the person paying you to figuring it out.
> Perl is free, the perl man pages are free:
> The camel book 3rd edition is $49.95:
> http://www.oreilly.com/catalog/pperl3/
> Java has its equivalent as well.  Good documentation is the profitable
> end of the open source world.
>
> Second, "He who controls the assembly controls the universe."  The cost
> of assembling your own human genome from the parts is prohibitive and a
> waste of time for most companies, and is a source of natural lock-in by
> a given vendor of the human genome.  You have any number of assemblies
> to choose from in the world, at the moment.  Software closely tied to a
> given assembly forces you to invest in that assembly, and you will be
> forced to change software as you update assemblies.  Software that
> understands multiple assemblies is preferable, to the customer, and the
> DAS protocol, as designed, understands multiple assemblies.  How well
> does the DAS server decouple the das protocol output from the assembly
> input?  Is it closely tied to a particular assembly, or does it stand
> alone, doing one job?  How easy is it to switch assemblies?
>
> Third, can you show the human genome with your internal data.
> Representing your data on the various assemblies, is a problem most
> companies have.  A das server should give a clear, well documented,
> nonvolatile interfaces, that you can easily write a conversion layer
> between your system and the assembly(ies) you have chosen to map to.
>
> Fourth, Does it understand your firewall?
> The code is useless unless it works well with your internal security
> systems.  Proxy support.
>
> Given the above the ideal DAS server should:
>
> 1) The install and test documentation written so a first year graduate
> student in biology with no prior programming experience, could install
> the system and run the tests, they should need their system
> administrator and their root access minimally (read make install) to get
> the program running, and be told when they need that systems
> administrator.  Clear examples should be provided.  This documentation
> should be tested with the above use case (without the programmer being
> there!), and then be considered good enough for publication.
>
> 2) The server should be written to easily work with multiple assemblies
> and not be closely tied to a particular assembly.
>
> 3) Have a clear, well documented, interfaces that match up an example
> reference server with an example annotation server, the best example
> being one where the same annotations are served on two different
> assemblies.  This should also be given the test of the documentation
> being set up by a first year biology graduate student with no prior
> programming experience.
>
> 4) Have the ability to work with proxies.
>
>
> How does LDAS and DAZZLE rate on:
>
> 1) I am not a first year biology graduate student, but being trained as
> a programmer, I found the LDAS documentation clearer and more worked
> out.  See: http://www.ensembl.org/Dev/Lists/das/msg00770.html for more
> details.  Take an afternoon to install and run the test cases on both,
> your decision will be easy, most importantly see 4.
>
> 2) Dazzle, as shown in one version of the install documentation
> (http://www.ensembl.org/Docs/das_server_v1.0.pdf), is closely coupled
> with Ensembl, and its version of the assembly.  LDAS works from an
> intermediate tab delimited flat file which is not tied to any particular
> software or assembly.
>
> 3) Dazzle is still in the design phase of development and in the past
> was going over name changes to its interfaces and classes. e.g.
> jclass="org.biojava.servlets.dazzle.datasource.EmblDataSource",
> jclass="org.biojava.servlets.dazzle.datasource.GFFAnnotationSource",
> jclass="ensembl.EnsemblGenericSeqFeatureSource",
> jclass="ensembl.RemoteGenericSeqFeatureSource", etc.  You have to be
> really aware of what your datasource is to get the proper handler, where
> you information about the handlers available I am not sure.  This is not
> a problem with LDAS as you write to one format before you import it into
> the LDAS system.  Dazzle has only one worked out example and no test
> case shown to show that you have it working (depending on what install
> process you want to try see
> (http://www.ensembl.org/Docs/das_server_v1.0.pdf, or
> http://www.biojava.org/dazzle/deploy.html))  LDAS has one install page,
> see: http://www.biodas.org/servers/LDAS.html.  How the LDAS server
> implements the data in mysql is hidden from you, and that is ok, you
> have to learn the api of the ldas loading script, get your data into the
> flat file format, configure your connection to your database, copy it to
> your web directory and view the test case.
>
> 4) The problem has been understood by both and has been addressed in the
> code, it may be added to the documentation already.
>
> Given the above, try both, but I lean heavily toward LDAS.
>
> Warmest Regards,
>
> Jim Freeman
> Senior Scientist
> Variagenics, Inc.
>
> David Huen wrote:
> > Perhaps if you could say something about what you are actually trying to
> > do, we might be able to help you more effectively.
> >
> > E.g., what sort of data are you trying to serve? Does it come from an SQL
> > database or flat files?  If an SQL database, is this Ensembl-based or
> > proprietary? If flat files, what format? Embl? GAME? AGAVE? Proprietary?
> >
> > Also, you seem to be within a company. Does it already have a
> > bioinformatics arm and have they standardised on a language already?  If,
> > frinstance, they are Perl-centric, then the choice is academic.  What
> > sort of time-frame must the product be delivered in? If it's got to be in
> > yesterday and they don't have a bioinformatics arm, then I'd begin by
> > asking for a raise... ;-)
> >
> > Regards,
> > David Huen
> >
> > _______________________________________________
> > DAS mailing list
> > DAS@biodas.org
> > http://biodas.org/mailman/listinfo/das
>
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================