[Bioperl-pipeline] Next set of questions.

Peter Kos kos@rite.or.jp" <kos@rite.or.jp
Fri, 30 Aug 2002 15:57:00 +0900


Hi all,

Thank you for the advices. Now mysql is up. There were problems with 
permissions, groups, the my.sock and some others.
Now comes the setup of ensembl and the specific perl modules.

The today's questions about the pipeline:

1., According to the "Ensembl 7.3 Website Installation Instruction" 
the ensembl API is given an integer version number that corresponds 
to the database schema that it was built for.
So, the program to be installed seems to be specific to the release 
version of the human genome and Drosy and such.
I do not want any of these large datasets. I'll try to install 
ensembl program without these, tell me if I am wrong and it is not 
possible.

I just want to have a nicely annotated bacterial genome in a nice and 
usable GUI. Of course it should contain gene predictions, BLAST 
homologies, Pfam hits, clickable links to EMBL and SwissProt entries 
and such, but it does not have anything to do with the complexity of 
the human genome project.

2., On the other hand it would be nice to have a small database to 
play with while I am setting up our own system. Is there some small 
(demo) database for this purpose?

3., Of course I can not predict what information we need to have in 
the database in the future. Is it possible to change the schema of 
the database once it is already populated and being used?

4., You are working on implementing Genscan and Genewise. I do not 
know how they would perform on a bacterial genome, but I can give it 
a try as a lab-rat. Could you point at the location of the binaries 
to download? Or will they work through the Internet? I have ONLY and 
EXCLUSIVELY port 80 open to the World, so if it is anything else than 
Http, I will not be able to use it.
(Likewise cvs, ftp, https, icq, ssh do not work among others.)

5., I would need results of bacterial gene finders, such as glimmer 
which will hopefully work here if I finally succeed solving some 
funny problem with it. And the result of GeneMarkS, which is provided 
through the Internet, and I will have an output file through e-mail. 
I can parse/convert any files to a Bioperl-readable format, if I have 
to, that is not a problem. Can I then somehow inject these things 
into the pipeline or right into the database?
Sorry for the lame questions, but I am confused a bit as I am working 
on every possible levels of the problem, starting from the special 
chemistry of the sequencing reactions, to editing, finishing, 
annotating, pipeline- WebServer- database-setup, as well as designing 
and coding a search for a new kind of feature.
By the way the result of this one should also go into the database.
Moreover I am confused because an OO database would have been closer 
to my current knowledge, and I can not imagine how this whole thing 
would work. I hope it will one day. I have read "Ensembl Tutorial" 
which only shows how to get things out and not how to put them in.

6., My "compute farm" is one piece of small SUN. Do I need these 
fancy LoadSharing things like LSF and whatever the other one is 
called?

That's perhaps enough for today. :-)
Sorry for the too many questions.

Regards
Peter

---------------------------
Peter B. Kos,
 (RITE)
E-mail: kos@rite.or.jp