[Bioperl-pipeline] some newbie questions

Wed Sep 17 04:12:55 EDT 2003

On Monday, September 15, 2003, at 12:19 AM, Marc Logghe wrote:

> Hi all,
> I am brand new to biopipe, so please forgive me if I ask some silly 
> questions.
> I am currently playing with the idea of implementing the bioperl 
> pipeline and for that I have done some homework by reading a number of 
> biopipe documents. I might have missed a few relevant documents, 
> though ;-)
>
Ah, I'm writing some of them probably. Documentation may come sooneror 
later, depending on how soon I settle into school.

> However there is at least one thing that is not yet clear to me. Up to 
> now, we are mirroring a number of databases, like wormbase, and 
> handling it manually. This means, unpacking it, making the chromosomes 
> and wormpep sequences blastable; genomewide blast to map some features 
> in which we are interested; reformatting the database and custom 
> mapping data to gff; import into gbrowse; ...

For data preparation, there is some but it maybe limited. One should be 
able to roll out your own and plug it in.  These would come under 
InputCreates. In Bio::Pipeline::InputCreate::* modules are responsible
for various means to setup the inputs and jobs to the pipeline. For 
example a module that does file based blasting of sequences called 
setup_file_blast will

a) given a file of input sequences in any format, split the file into a 
specified number of chunks.
b) create a blast job in the pipeline for each chunk
c) create the specified working directory for storing the output files
d) format the db file for blasting if you are blasting against itself 
if the option is specified

see bioperl-pipeline/xml/examples/xml/blast_file_pipeline.xml

If say you want to have the blast output stored as gff files, then u 
can specify a data dumper as an output iohandler, see
bioperl-pipeline/xml/examples/xml/blast_db_flat.xml  which uses 
Bio::Pipeline::Utils::Dumper

Alternatively if you want, you can probably use Bio::DB:GFF as an 
output handler to take the blast features and store in directly in to 
the database using the Seqfeature gff_string method.
Any customization you will want to do you should probably roll your 
module which you can plug in as an output iohandler.

>> From the documentation it is pretty clear that the genomewide blast 
>> is especially suited for biopipe.
> But what about all te rest, especially the preparation of the input 
> data ? Also, how can you trigger the pipeline ? I mean, every week 
> wget is fetching new wormbase data, and of course the pipeline shoud 
> only be triggered when new data have arrived. How can you do that ?

Right now, the best bet would be to write some  pipeline that reads new 
sequences from some directory or file to load sequencing into a db or 
treat as a file and carry out the blast. See blast_file_pipeline.xml or 
blast_db_flat.xml
for similar example.
This would be triggered by some kinda of cron job that checks the last 
modification time of the data file. Nothing for this is currently 
written so you are welcome to give it a shot.
> Can you use biopipe for tasks like installing the new version of acedb 
> ?
>

I have no knowledge of  installing acedb and biopipe cannot do this  so 
I can't say much. Biopipe is more suited for task where you wanna 
parallelize multiple jobs or have some kinda of workflow that you want
to execute in a certain order. So it must be quite complex to setup 
acedb if you need a pipeline to do so?

cheers,

shawn

-shawn