[Bioperl-l] AlkahEST (was: want to bring tools together to help small labs)
T.D. Houfek
tdhoufek@unity.ncsu.edu
Mon, 5 Nov 2001 04:30:46 -0500 (EST)
Hi everyone!
Thanks for the great response!
Sorry to have taken a few days to respond; my sysadmin duties came calling
when one of our most important machines went down... this just fuels my
personal desire to be officially re-categorized as something else
(anything else!) besides System Administrator. ;-)
Anyway, here are my reponses to several people who helpfully answered my
original post -- hope you all don't mind being crammed into one long post!
ELIA STUPKA:
Sagely notes:
>As far as the pipeline database is concerned, I think to go all the way
>to the functionality described in the mail, you would actually end up
>with something similar to the ensembl pipeline.
As someone who has witnessed the ghastly spectacle of the variety of
stomachache that results from biting off more than one can chew, I
appreciate Elia's remark. The package I envision will undeniably be a
complex and difficult monster to engineer. Because its highest design
priorities are ease-of-installation and use for small and medium
laboratories, its features will inevitably lag behind the most expensive
commercial packages, and also behind Ensembl, which has megawatts of
brilliance and tons of momentum behind it. I would consider the package a
stunning success if it did such a good job at facilitating small
laboratory research that the successes of the labs that used it
allowed them to outgrow the package... in other words, so that they could
"afford" to install and maintain Ensembl and/or in-house software tailored
to their needs. Because Ensembl is freely-available and open-source (and
because I'll need the help of those who have built it know better than
anyone the problems that come with the territory) I think it is extremely
important that migrating to Ensembl from the package I am imagining is
made as easy as possible.
The release of bioperl-db will certainly be a great help in this effort,
and thanks to those who make it possible! And Elia, could you elaborate
on the pros and cons of making feature pairs symmetrical/assymetrical? Is
it the difference between a 1:1 correlation between a feature and a
certain sequence span, and a 1:N correlation between features and spans?
i.e. features with multiple span locations in the geneome? Or is there
something additional that I am missing?
BRIAN OSBORNE:
In an off-list email, Brian advised that it would be good if the package
didn't focus solely on ESTs: in particular, if labs that do a lot of EST
work could also assemble full-length cDNAs from a subset of ESTs, or
"cluster" ESTs. He noted that bioperl already has modules to handle phred
and phrap, and that the package could facilitate assembly/clustering of
reads specified by a user.
I thank Brian for tipping me off to the existence of these BioPerl
modules, and heartily agree that it is important to include these features
in the package. Over the last two years of its existence, our lab has
mostly worked with ESTs (though we're about to be hit by the entire
Magnaporthe grisea genome... gulp...). We've nevertheless relied on Phrap
assemblies to give us some idea of relative levels of expression, and to
define loosely-so-called "unigene sets" as queries for BLAST searches.
Assemblies have even been useful for the cost-effectiveness-determination
aspect of quality control: we've used them to help us determine the
progression of redundancy in the ongoing sequencing of a library. So I
consider negotiation of assembly data of definite importance to a package
for smaller labs: who else must keep such a close eye on
cost-effectiveness?
Here might be a good time for me to introduce my suggestion for the name
of the package (for one thing, I'm tired of having no other handle for it
than 'the package'). I propose the name 'AlkahEST', partly because 'EST'
helps to form the name, but chiefly because the 'alkahest' was the
Universal Solvent sought by the alchemists... a substance thought to be
capable of dissolving ANY MATERIAL. (What do you keep something like that
in?) Anyway, despite the possible bad luck implicit in naming a package
after The Original Vaporware, I think its catchy and will use it until I
hear a better suggestion. :-)
In any case, it looks like we will be getting www.alkahest.org to start
setting up house in!
FERNAN AGUERO:
Fernan basically asked whether this thing, this 'AlkahEST', had received
any positive responses/suggestions, and whether if we agreed on goals
(and, ha ha, possibly names...) we could join forces.
As he's probably noticed since he posted, I've received several positive
responses from this list, as well as a couple of valuable cautions and
referrals to wheels already invented. I have a great deal of support both
from my own laboratory, which would like to be known as a benificent
sponsor of open software as well as a fine research facility, and from a
couple of neighboring laboratories, who also come from this direction.
What this amounts to in terms of actual man-hours of work will probably
depend on the outcome of some grant proposals. I am hoping that the
granting agencies will recognize the importance of improving the free
software situation for smaller labs. Money-getting is really not in my
"skill-set", so if anyone has any good grant tips or other funding ideas,
feel free to pass them along!
Whatever happens with grants, I can count on the good advice of my boss,
Doug Brown, and some help from the other junior but resourceful folks I
work with here at the lab. And I have two very good friends (who I
encourage to join this list) who are intensely interested. One has
programming knowledge, and the other has database knowledge, both of which
far exceed mine.
And certainly, certainly, we are willing to join forces with you, Fernan,
and with anyone who is seriously interested. Also I will listen closely
to the advice of those too busy to be seriously interested. I'm not doing
this because I'm the best man for the job -- I'm doing it because I have
nothing better to do! ;-)
BRIAN OSBORNE:
I am ashamed to say I had no knowledge of MyGenBank before Brian brought
it to my attention. Is anyone familiar with both MyGenBank and
bioperl-db, and able to compare/contrast them?
JASON STAJICH:
Thanks Jason for the tip about GenQuire. It too is added to the list of
Things To Check Out... especially since I just noticed the confirmation
that it has burst forth into the clean clear air of the world of open
source! May Mark and Dave each receive a dozen saintly lives' worth of
good karma for making this come about!
FERNAN AGUERO (again):
On the database-storage of BLAST data... Doug Brown (my aforementioned
boss) developed for our lab a database schema for BLASTs mirroring the
structure of the XML document that can be produced by NCBI's blastall. It
seems to effectively capture all of the information produced by a BLAST,
although I've not yet tested it with a PSI-BLAST run. Currently we are
loading these XML reports into a mySQL database using a mapping file
and Ronald Bourret's XMLDBMS (http://www.rpbourret.com/xmldbms/) but a
script applying the BioPerl XML-BLAST parser could do the same job. Let
me know if you'd like the SQL to construct the database and we can send it
to you.
And yes, management, relation, display, and report extraction of BLAST
data are all extremely important to the usefulness of AlkahEST.
Local researchers and other authenticated users should be able to specify
and browse batch BLAST searches through a web interface; others should be
able to BLAST their own queries against those portions of the database to
which they are allowed access; and the database should relate BLAST
results to their queries and subjects so that when you browse to a clone
or a contig you are confronted with a report of the homologies produced by
all the searches (to which you have access) that have been performed thus
far.
I'm sure all will agree this is quite enough for one post...
T.D. Houfek
system administrator
NCSU Fungal Genomics Laboratory
(919)513-0025
tdhoufek@unity.ncsu.edu