[DAS] LDAS vs Dazzle

jfreeman jfreeman@variagenics.com
Fri, 04 Jan 2002 11:26:49 -0500

I agree with David, and would like to add the following commentary:

First, he true cost of open-source GPL'ed software is the time it takes
to find and acquire the proper documentation to work with it.  A badly
documented piece of open source code is worthless if your time is not
free or considered free to the person paying you to figuring it out.
Perl is free, the perl man pages are free:
The camel book 3rd edition is $49.95:
Java has its equivalent as well.  Good documentation is the profitable
end of the open source world.

Second, "He who controls the assembly controls the universe."  The cost
of assembling your own human genome from the parts is prohibitive and a
waste of time for most companies, and is a source of natural lock-in by
a given vendor of the human genome.  You have any number of assemblies
to choose from in the world, at the moment.  Software closely tied to a
given assembly forces you to invest in that assembly, and you will be
forced to change software as you update assemblies.  Software that
understands multiple assemblies is preferable, to the customer, and the
DAS protocol, as designed, understands multiple assemblies.  How well
does the DAS server decouple the das protocol output from the assembly
input?  Is it closely tied to a particular assembly, or does it stand
alone, doing one job?  How easy is it to switch assemblies?

Third, can you show the human genome with your internal data. 
Representing your data on the various assemblies, is a problem most
companies have.  A das server should give a clear, well documented,
nonvolatile interfaces, that you can easily write a conversion layer
between your system and the assembly(ies) you have chosen to map to.

Fourth, Does it understand your firewall?
The code is useless unless it works well with your internal security
systems.  Proxy support.

Given the above the ideal DAS server should:

1) The install and test documentation written so a first year graduate
student in biology with no prior programming experience, could install
the system and run the tests, they should need their system
administrator and their root access minimally (read make install) to get
the program running, and be told when they need that systems
administrator.  Clear examples should be provided.  This documentation
should be tested with the above use case (without the programmer being
there!), and then be considered good enough for publication.

2) The server should be written to easily work with multiple assemblies
and not be closely tied to a particular assembly.

3) Have a clear, well documented, interfaces that match up an example
reference server with an example annotation server, the best example
being one where the same annotations are served on two different
assemblies.  This should also be given the test of the documentation
being set up by a first year biology graduate student with no prior
programming experience.

4) Have the ability to work with proxies.

How does LDAS and DAZZLE rate on:

1) I am not a first year biology graduate student, but being trained as
a programmer, I found the LDAS documentation clearer and more worked
out.  See: http://www.ensembl.org/Dev/Lists/das/msg00770.html for more
details.  Take an afternoon to install and run the test cases on both,
your decision will be easy, most importantly see 4.

2) Dazzle, as shown in one version of the install documentation
(http://www.ensembl.org/Docs/das_server_v1.0.pdf), is closely coupled
with Ensembl, and its version of the assembly.  LDAS works from an
intermediate tab delimited flat file which is not tied to any particular
software or assembly.

3) Dazzle is still in the design phase of development and in the past
was going over name changes to its interfaces and classes. e.g.
jclass="ensembl.RemoteGenericSeqFeatureSource", etc.  You have to be
really aware of what your datasource is to get the proper handler, where
you information about the handlers available I am not sure.  This is not
a problem with LDAS as you write to one format before you import it into
the LDAS system.  Dazzle has only one worked out example and no test
case shown to show that you have it working (depending on what install
process you want to try see
(http://www.ensembl.org/Docs/das_server_v1.0.pdf, or
http://www.biojava.org/dazzle/deploy.html))  LDAS has one install page,
see: http://www.biodas.org/servers/LDAS.html.  How the LDAS server
implements the data in mysql is hidden from you, and that is ok, you
have to learn the api of the ldas loading script, get your data into the
flat file format, configure your connection to your database, copy it to
your web directory and view the test case.

4) The problem has been understood by both and has been addressed in the
code, it may be added to the documentation already.

Given the above, try both, but I lean heavily toward LDAS.

Warmest Regards,

Jim Freeman
Senior Scientist
Variagenics, Inc.

David Huen wrote:
> Perhaps if you could say something about what you are actually trying to
> do, we might be able to help you more effectively.
> E.g., what sort of data are you trying to serve? Does it come from an SQL
> database or flat files?  If an SQL database, is this Ensembl-based or
> proprietary? If flat files, what format? Embl? GAME? AGAVE? Proprietary?
> Also, you seem to be within a company. Does it already have a
> bioinformatics arm and have they standardised on a language already?  If,
> frinstance, they are Perl-centric, then the choice is academic.  What sort
> of time-frame must the product be delivered in? If it's got to be in
> yesterday and they don't have a bioinformatics arm, then I'd begin by
> asking for a raise... ;-)
> Regards,
> David Huen
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das