[Bioperl-pipeline] multiple pipelines

Elia Stupka elia at tll.org.sg
Tue Jul 1 16:11:29 EDT 2003


Hi Jeremy,

we are currently having an internal discussion about this, we are 
actually trying to work towards a new multi-pipeline system, where one 
database could contain multiple pipelines. Also, files relating to jobs 
would have pipeline ids, etc. etc. and finally the web manager would 
track multiple pipelines. This is at discussion stage at the moment, 
though Juguang and Aaron over here seem set to work on this soon.

> One other note: with our setup, reading/writing from/to an nfs 
> directory
> during a blast analysis is very io bound.

Absolutely. To achieve best performance you need:

1-Blast database local to node with best possible read speed (in our 
case with 2  mirrored local hard disks)

2-Write STDOUT and STDERR to local node, read results from there and 
finally store results in database (no need to copy anywhere)

The only current caveat with point 2 is that if a job fails, the error 
file stays there, and there is no simple way to track which node a job 
is running on. We are about to change the database schema and the code 
to make sure we keep track of the node id that a job is running on 
after it is submitted.

> then copied back to the nfs mounted directory the analysis was started 
> in

If you are using a database (e.g. BioSQL or Ensembl) to store your 
blast results, you don't even need this last step, you just parse the 
file locally and then write results back to the db.

Elia

---
Bioinformatics Program Manager
Temasek Life Sciences Laboratory
1, Research Link
Singapore 117604
Tel. +65 6874 4945
Fax. +65 6872 7007



More information about the bioperl-pipeline mailing list