[Bioperl-pipeline] Candidate bioperl-pipeline-bundle 0.1

Wed Apr 16 13:26:36 EDT 2003

Hi Folks,
	I have a tar ball of a bioperl-pipeline bundle that contains all the 
necessary bio* + ensembl packages for running
the pipelines currently available.  It is available here:

http://www.biopipe.org/download/biopipe-bundle-rc-0.1.tar.gz

Included packages are:

bioperl-run : current MAIN
bioperl-live : current MAIN(live because I needed the cigar string 
function in GenericHSP not found in release 1.2.1)
bioperl-pipeline: current MAIN
ensembl: current MAIN
bioperl-db : 1-0-0
biosql-schema: 1-0-0

The Major Changes in the biopipe code

1) Done by Kiran from the hackathon, there is now no requirement to 
install EXPAT and XML::Parser,
     we allow a pure perl solution if you have difficulties installing 
it. It will work
     if you have XML::SAX::PurePerl installed instead.

2) A primitive job_viewer script that allows one to query job status, 
look at failed error logs etc

3) XML has a <global> section which allows one to define variables 
within the XML document.
     So most modifications to the XML may be made centrally and really 
it makes a lot easier to write and configure the XML now (to me anyway)

4) XML system variables like INPUT,OUTPUT (found in the <value> tags of 
arguments)  etc are now to be demarcated by '!' like so: !INPUT! This 
is to make it clearer these are
      system variables. I have added the following system variables so 
it is now easier to pass in different kind of inputs:

   !INPUT! - This is the input id name specified for the particular 
input.
   !ANALYSISX! - Here X refers to a digit character and it corresponds 
to the analysis id
                specified in the analysis definition portion of the XML 
file:
                eg. <analysis id="1"> would be ANALYSIS1
   !ANALYSIS!   - Without a number appended, this would correspond to 
the current analysis.
   !ANALYSIS_NAME! - This refers to the value of the Analysis logic name 
of the current analysis
   !IOHANDLERX! - Here X refers to a digit character and it corresponds 
to the iohandler id
                specified in the iohandler_setup portion of the XML file.
                e.g. <iohandler_id="2"> would be IOHANDLER2

     -Additional Variable for IOHandlers of type OUTPUT

     !INPUTOBJ! - This corresponds to the actual input obj fetched by 
the iohandler.
                For the example above, this would correspond to the $seq 
objct.

     !INPUTOBJX! - If a job has more than one input, you can specify 
which particular input obj
                 where X is a digit representing the rank of the input. 
Here the inputs are ranked according
                 to their input id in the input table. You will thus 
need to know the order of the
                 inputs that are created by the InputCreate modules

There is now a README for XML in bioperl-pipeline/xml/README so more 
info there.

5) Transformers are working more stably now.

6) Documentation
Modified Biopipe INSTALL doc and added a XML README.
I will need help to update the website documentation.
I will try and write a HOWTO for writing XML files now that I'm happier 
with the XML design.

Elia, a tarball of the website is available  if you want migrate to the 
O|B|F servers:
http://www.fugu-sg.org/~shawnh/biopipe/biopipe-web.tar.gz

Sample Pipelines:
The following example pipelines are provided in the example directory:

blast_biosql_pipeline: Fetch sequences from a biosql database and blast 
it against a dbfile, storing it back as seqfeatures (works off 
bioperl-db-1-0-0)
				    Naturally the schema is limited in storing the alignment string.

blast_db_flat		 : Fetch sequences from a fasta formated database and 
blast against a dbfile, dumping hits as gff features

blast_file_pipeline     : Take a input file of sequences, chop it up 
and send for blasting. Raw Blast output is stored.

cdna2genome_pipeline : Take an input file of cdna sequences, blast 
against a genomic database, get top hits and run sim4/est2genome, dump 
genes as gff

phylip_tree_pipeline: Take an input file of protein sequences, run 
clustalw->seqboot->protdist->neighbor->consense->drawtree

protein_annotation_pipeline: Fetch input sequences from BioSQL run 
TMHMM,SEG,FingerPrintScan,PFScan,SINGALP,PFAM

genome_annotation_pipeline: Fetch sequences from an ensembl 
database(version > 12)run RepeatMasker->Blast->Genewise, storing back 
into Ensembl

The last one needs more testing. TLL guys, I'm gonna try and set this 
up for ciona.

Naturally we have quite a number of wrappers in bioperl-run and it will 
be quite easy to write more example pipelines which I would like to see.

Pls try and test, and we can try for a release.

best wishes,

shawn