[Bioperl-pipeline] xml dir housekeeping

Thu Jan 30 11:39:14 EST 2003

I would imagine starting with tests that test the conversion logic first? So feature_in->feature_out
then we have specific pipeline test that work with converters like u mentioned below. If your converters
make api calls to ensembl, then u need to make sure u do a check before the test is run and skip if the
ensembl api is not available. I don't think for converter tests you need to run an analysis. You should just
create dummy Bioperl objects and Ensembl objects (which is doable from flat files) , pass it into the pipeline and test
that the objects are converted correctly.(it could be a dummy runnable) If the checks are done properly (as defined by
their interfaces), they should probably store correctly anyways.

shawn

On Thu, 30 Jan 2003, Juguang Xiao wrote:

> 
> > > On writing pipeline tests, pipeline tests should  be file based, 
> > > meaning it doesn't assume the availability of biosql or ensembl or 
> > > other schema. Also no hardcoding.  Within the xml, one can add the 
> > > various adaptors and stuff but commented out for testing purposes.
> > 
> > Juguang, if any of this is not clear please let us know, so that we 
> > make sure any of your XMLs are up to speed.
> 
> Hi Shawn, Elia and Kiran,
> 
> Sure, I agreed with the idea of dev directory for xml. However, I am thinking about the pipeline test with ensembl. Mainly, as you know, the converter stuff is designed to convert objects between bioperl and ensembl so far, so that pipeline can make full use of huge data set, rather than flatfile. The problem raises that whether we need to test the instance of converter through running a pipeline OR without pipeline. In the case of ensembl series converter, it does not make sense if not storing into db. Right?
> 
> I had an idea on the pipeline running with db, it may be an idea of lazy man, :) but does be my experience on converter development. Each time, BEFORE I want to run a test to store the result of converter into the ensembl, I usually did the following steps
> 
> 1) Make sure that we have ensembl in the environment. Of course, we, fugu team, have it on the pulse mahine. For others, they make use kaka.sanger.ac.uk 
> 2) Prepare a set of data for testing. I write a shell script to create a new test database and copy a set of data from ensembl db.
> 
> ############
> 
> #!/usr/bin/sh
> mysqldump -u root -d homo_sapiens_core_9_30 > ens_core_9_30.sql
> mysqldump -u root -t -w 'dna_id<1000' homo_sapiens_core_9_30 contig > ens_homo_c
> ore_9_30.contig.sql
> mysqldump -u root -t -w 'dna_id<1000' homo_sapiens_core_9_30 dna > ens_homo_core
> _9_30.dna.sql
> mysqldump -u root -t homo_sapiens_core_9_30 analysis > ens_homo_core_9_30.analys
> is.sql
> 
> 
> mysqladmin -u root drop $1
> mysqladmin -u root create $1
> 
> mysql -u root $1 < ens_core_9_30.sql
> mysql -u root $1 < ens_homo_core_9_30.contig.sql
> mysql -u root $1 < ens_homo_core_9_30.dna.sql
> mysql -u root $1 < ens_homo_core_9_30.analysis.sql
> mysql -u root $1 < my_analysis.sql
> 
> rm ens_core_9_30.sql
> rm ens_homo_core_9_30.contig.sql
> rm ens_homo_core_9_30.dna.sql
> rm ens_homo_core_9_30.analysis.sql
> 
> #############
> 
> 3) add analysis records into the test db. This ensembl analysis table is set according the ensembl computer environment, and may not be directly suitable in our server such as the analysis that we use.
> 
> With such prerequisite, I can test my converter instance on pipeline. I am thinking whether we can write a analysis to make the above work done within the pipeline.
> 
> I feel quite hard to define the internal and external parts of pipeline now. I think the internal framework of the pipeine is done. However, to meet the needs to run some special analysis or handle different data source, we developed converter subsystem, dumper for flatfile, input_creates for handling the mutiple or special inputs such as genewise with 2 inputs. Even the instances of the runnable, I see, are also external part of the pipeline. We developed the framework of pipeline done, and now are trying to make more and more instances of pipeline for our use or demos. Please correct me if wrong.
> 
> Hence, developing a module for preparing the dataset is a reasonable requirement. I think more varieties of individual requirement will come to pipeline when more people are using them.
> 
> 
> 

-- 
********************************
* Shawn Hoon
* http://www.fugu-sg.org/~shawnh
********************************