From Marc.Logghe at devgen.com  Mon Sep 15 03:19:47 2003
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Sep 15 03:18:09 2003
Subject: [Bioperl-pipeline] some newbie questions
Message-ID: <BEE28BF86078B6429D6C780635718E21039545@morelia.be.devgen.com>

Hi all,
I am brand new to biopipe, so please forgive me if I ask some silly questions.
I am currently playing with the idea of implementing the bioperl pipeline and for that I have done some homework by reading a number of biopipe documents. I might have missed a few relevant documents, though ;-)
However there is at least one thing that is not yet clear to me. Up to now, we are mirroring a number of databases, like wormbase, and handling it manually. This means, unpacking it, making the chromosomes and wormpep sequences blastable; genomewide blast to map some features in which we are interested; reformatting the database and custom mapping data to gff; import into gbrowse; ...
>From the documentation it is pretty clear that the genomewide blast is especially suited for biopipe.
But what about all te rest, especially the preparation of the input data ? Also, how can you trigger the pipeline ? I mean, every week wget is fetching new wormbase data, and of course the pipeline shoud only be triggered when new data have arrived. How can you do that ?
Can you use biopipe for tasks like installing the new version of acedb ?
Many thanks in advance,
Marc


From strach.joachim at web.de  Tue Sep 16 07:50:19 2003
From: strach.joachim at web.de (Joachim H. Strach)
Date: Tue Sep 16 07:48:39 2003
Subject: [Bioperl-pipeline] new pipeline
Message-ID: <200309161150.h8GBoJQ20948@mailgate5.cinetic.de>

Hello,

I am a student of computer science and I intern at the working group of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg, University, Faculty of Biology). 

My task is to build a new pipeline with Biopipe:
- runs in local mode
- blast FASTA files against a intern database.
- blast results again a database.
- store results in a MySQL database, so we can easily export outputs to a spreadsheet. Or as a alternativ directly store the results in a self-defined spreadsheet.


I did not find an example pipeline, which parses the blast output to a database. Will Biopipe do this for me ?
And in general: Which converting of formats is possible with the current Biopipe?
Do I have to write new methods in perl also?

Many question, I am glad to hear from you.


Regards
Joachim Strach


-- 
Joachim H. Strach
Phone in office from 9.00 am to 5 pm: 
Germany 0761-203-6988
______________________________________________________________________________
Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest!
Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181

From kumark at cshl.org  Tue Sep 16 09:59:42 2003
From: kumark at cshl.org (Kiran Kumar)
Date: Tue Sep 16 09:58:11 2003
Subject: [Bioperl-pipeline] new pipeline
In-Reply-To: <200309161150.h8GBoJQ20948@mailgate5.cinetic.de>
Message-ID: <Pine.GSO.4.05.10309160950580.10139-100000@phage.cshl.edu>

Hi Joachim,
Look at the examples in /xml/examples/xml/
blast_biosql_pipeline.xml
blast_db_flat.xml

The first one reads the inputs to the blast from biosql database and
writes the output to biosql. The second one can dump to a gff format file.

Kiran

On Tue, 16 Sep 2003, Joachim H. Strach wrote:

>Hello,
>
>I am a student of computer science and I intern at the working group of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg, University, Faculty of Biology). 
>
>My task is to build a new pipeline with Biopipe:
>- runs in local mode
>- blast FASTA files against a intern database.
>- blast results again a database.
>- store results in a MySQL database, so we can easily export outputs to a spreadsheet. Or as a alternativ directly store the results in a self-defined spreadsheet.
>
>
>I did not find an example pipeline, which parses the blast output to a database. Will Biopipe do this for me ?
>And in general: Which converting of formats is possible with the current Biopipe?
>Do I have to write new methods in perl also?
>
>Many question, I am glad to hear from you.
>
>
>Regards
>Joachim Strach
>
>
>-- 
>Joachim H. Strach
>Phone in office from 9.00 am to 5 pm: 
>Germany 0761-203-6988
>______________________________________________________________________________
>Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest!
>Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181
>
>_______________________________________________
>bioperl-pipeline mailing list
>bioperl-pipeline@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-pipeline
>

From shawnh at stanford.edu  Wed Sep 17 04:11:36 2003
From: shawnh at stanford.edu (Shawn Hoon)
Date: Wed Sep 17 04:06:34 2003
Subject: [Bioperl-pipeline] new pipeline
Message-ID: <8E9255D8-E8E6-11D7-81DC-000A95783436@stanford.edu>


> Hello,
>
> I am a student of computer science and I intern at the working group  
> of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg,  
> University, Faculty of Biology).
>
> My task is to build a new pipeline with Biopipe:
> - runs in local mode
> - blast FASTA files against a intern database.
> - blast results again a database.
> - store results in a MySQL database, so we can easily export outputs  
> to a spreadsheet. Or as a alternativ directly store the results in a  
> self-defined spreadsheet.
>
>
> I did not find an example pipeline, which parses the blast output to a  
> database. Will Biopipe do this for me ?
> And in general: Which converting of formats is possible with the  
> current Biopipe?
> Do I have to write new methods in perl also?
>


The format conversion of file formats is not done by Biopipe. Instead,  
it generalizes by allowing different Input and Output modules( called  
iohandlers) to be
used to interface the inputs and outputs of analysis between databases  
and files. The most comprehensive set of perl modules that allow these  
are from bioperl. For example:
Bio::SeqIO, Bio::AlignIO, Bio::TreeIO etc, these all may be used by  
Biopipe via the xml definition of the pipeline. Most of the time you  
will use this as the analysis modules (called runnables)
return mostly bioperl objects.You can also write to mySQL if you have  
the appropriate
adaptor modules that are able to take the output objects and calling  
the appropriate sql to store the data. If you have your own schema, you  
must write your own adaptor
See the examples mentioned by kiran for examples.

cheers,


shawn

> Many question, I am glad to hear from you.
>
>
> Regards
> Joachim Strach
>
>
> -- Joachim H. Strach
> Phone in office from 9.00 am to 5 pm:
> Germany 0761-203-6988
> _______________________________________________________________________ 
> _______
> Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest!
> Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
>

From shawnh at stanford.edu  Wed Sep 17 04:12:55 2003
From: shawnh at stanford.edu (Shawn Hoon)
Date: Wed Sep 17 04:07:59 2003
Subject: [Bioperl-pipeline] some newbie questions
Message-ID: <BDA26DF5-E8E6-11D7-81DC-000A95783436@stanford.edu>


On Monday, September 15, 2003, at 12:19 AM, Marc Logghe wrote:

> Hi all,
> I am brand new to biopipe, so please forgive me if I ask some silly 
> questions.
> I am currently playing with the idea of implementing the bioperl 
> pipeline and for that I have done some homework by reading a number of 
> biopipe documents. I might have missed a few relevant documents, 
> though ;-)
>
Ah, I'm writing some of them probably. Documentation may come sooneror 
later, depending on how soon I settle into school.

> However there is at least one thing that is not yet clear to me. Up to 
> now, we are mirroring a number of databases, like wormbase, and 
> handling it manually. This means, unpacking it, making the chromosomes 
> and wormpep sequences blastable; genomewide blast to map some features 
> in which we are interested; reformatting the database and custom 
> mapping data to gff; import into gbrowse; ...

For data preparation, there is some but it maybe limited. One should be 
able to roll out your own and plug it in.  These would come under 
InputCreates. In Bio::Pipeline::InputCreate::* modules are responsible
for various means to setup the inputs and jobs to the pipeline. For 
example a module that does file based blasting of sequences called 
setup_file_blast will

a) given a file of input sequences in any format, split the file into a 
specified number of chunks.
b) create a blast job in the pipeline for each chunk
c) create the specified working directory for storing the output files
d) format the db file for blasting if you are blasting against itself 
if the option is specified

see bioperl-pipeline/xml/examples/xml/blast_file_pipeline.xml

If say you want to have the blast output stored as gff files, then u 
can specify a data dumper as an output iohandler, see
bioperl-pipeline/xml/examples/xml/blast_db_flat.xml  which uses 
Bio::Pipeline::Utils::Dumper

Alternatively if you want, you can probably use Bio::DB:GFF as an 
output handler to take the blast features and store in directly in to 
the database using the Seqfeature gff_string method.
Any customization you will want to do you should probably roll your 
module which you can plug in as an output iohandler.


>> From the documentation it is pretty clear that the genomewide blast 
>> is especially suited for biopipe.
> But what about all te rest, especially the preparation of the input 
> data ? Also, how can you trigger the pipeline ? I mean, every week 
> wget is fetching new wormbase data, and of course the pipeline shoud 
> only be triggered when new data have arrived. How can you do that ?

Right now, the best bet would be to write some  pipeline that reads new 
sequences from some directory or file to load sequencing into a db or 
treat as a file and carry out the blast. See blast_file_pipeline.xml or 
blast_db_flat.xml
for similar example.
This would be triggered by some kinda of cron job that checks the last 
modification time of the data file. Nothing for this is currently 
written so you are welcome to give it a shot.
> Can you use biopipe for tasks like installing the new version of acedb 
> ?
>

I have no knowledge of  installing acedb and biopipe cannot do this  so 
I can't say much. Biopipe is more suited for task where you wanna 
parallelize multiple jobs or have some kinda of workflow that you want
to execute in a certain order. So it must be quite complex to setup 
acedb if you need a pipeline to do so?

cheers,

shawn


-shawn

From strach.joachim at web.de  Wed Sep 17 10:28:58 2003
From: strach.joachim at web.de (Joachim H. Strach)
Date: Wed Sep 17 10:27:14 2003
Subject: [Bioperl-pipeline] Plant-Biotech pipeline
Message-ID: <200309171428.h8HESwQ28519@mailgate5.cinetic.de>

Hello,

first of all thanks for your previous anwsers, they helped a lot for my understanding of the Biopipe workflow.

Some more question arised ... .
I would be glad if either you could tell me to find the suitable documentation or  give me some more answers.

I took a closer look to genome_annotation_pipeline.xml:
- What are the tags <transformer>, <input_iohandler_mapping>, good for?
- What is the function of the <data_monger> ?
- At the rule section in <action>: where is e.g. the "COPY_ID" related to?

- Shawn, why did you say "... return mostly bioperl objects". Which runnables do not and what do they return?
- My pipeline should perform two blast queries, where the second one gets as input the filtered ouput of the first one.  How can I filter on the bioperl objects directly without using IO-handling?  Or more general:  How can I pass on the bioperl objects returned from a runnable to the runnable of the next analysis?

Thanks for your advice.

Joachim


______________________________________________________________________________
Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail
und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183

From shawnh at stanford.edu  Wed Sep 17 11:44:21 2003
From: shawnh at stanford.edu (Shawn Hoon)
Date: Wed Sep 17 11:39:27 2003
Subject: [Bioperl-pipeline] Plant-Biotech pipeline
In-Reply-To: <200309171428.h8HESwQ28519@mailgate5.cinetic.de>
References: <200309171428.h8HESwQ28519@mailgate5.cinetic.de>
Message-ID: <CE89C2D0-E925-11D7-81DC-000A95783436@stanford.edu>


On Wednesday, September 17, 2003, at 7:28 AM, Joachim H. Strach wrote:

> Hello,
>
> first of all thanks for your previous anwsers, they helped a lot for  
> my understanding of the Biopipe workflow.
>
> Some more question arised ... .
> I would be glad if either you could tell me to find the suitable  
> documentation or  give me some more answers.
>
> I took a closer look to genome_annotation_pipeline.xml:
> - What are the tags <transformer>, <input_iohandler_mapping>, good for?

transformer defines modules that may be used to operate on data input  
and output before they are passed to iohandlers.
This includes filtering or other data transformation operations. Input  
transformers are applied after fetching from iohandlers,
while output transformers are applied before storing to db.

For example in a 2 stage blast which writes the result to the database  
both times and applying filters when fetching
the input and writing output, the flow is :

		                      		    store to db
                 			   		           /
	    filter trransformer		|filter transformer}
             /		       \                        /
Input-                          ----> blast  1/  (analysis 1)
           \                          -> blast 2    (analysis 2)
             \                       /                 \
  	   filter transformer             \
  					|filter transformer|
						   \
						  store to db

Previously biopipe requires that all results should be written to a  
database before being fetched again for the next analysis.
This is so that if say the analysis 2 fails, one would not need to  
rerun the first.

Now, sometimes the first analysis may be some simple operation that  
runs fast, and we don't want to bother with storing her results.
So I have recently committed some relatively new code for doing  
iohandler chaining.

the flow for the same analysis is sightly different:

	    filter transformer	     filter transformer
             /		       \                /                  \
Input-                          ----> blast  1              blast2
            							\
  								filter transformer|
						   			\
						  		store to db

   So for this case if blast2 fails, we gotta go back and rerun blast1.
I don't think I have committed the xml for this. Will do so, when I get  
back from a dept retreat this week.
see mail:

http://bioperl.org/pipermail//bioperl-pipeline/2003-August/000387.html

> - What is the function of the <data_monger> ?

this is to a 'special' runnable that is used to set up analysis.

Say you want to align cdnas to genomes.
What you may want to do is to run est2genome of the cdna on the section  
of the genome where it hits found via blast.
You wouldn't want to pass the entire chromosome for example to  
est2genome. Instead you may need to figure out the
region where the blast , do some padding and pass the slice of the  
genome together with the cdna to the next analysis.
So you would figure out the hit region, pass the start,end strand of  
the coordinates to the est2genome input iohandler .

So we would plug into the DataMonger, a Bio::Pipeline::InputCreate  
which contains various 'hacky' modules that
setup jobs very specifically as to how your analysis requires the  
inputs.  This is to reconcile how a lot of times,
the database adaptor modules do not return what you want to feed  
directly into an analysis.


> - At the rule section in <action>: where is e.g. the "COPY_ID" related  
> to?
>
Once a job is finished, the PipelineManager will look up what it should  
do next with regards to this job.
for COPY_ID it will reuse the same input id for the next analysis  but  
may map the input iohandler to a new one for example:

RepeatMasker->Blast

both use the same input say sequence_1
but the fetching of sequence for blast (via ensembl) would use a  
fetch_repeatmasked_seq while RepeatMasker would
fetch unmasked seq as its input. So there is a reuse of the input id  
and change of the input iohandler

see bioperl-pipeline/xml/README
> - Shawn, why did you say "... return mostly bioperl objects". Which  
> runnables do not and what do they return?

Uhm, okay you got me. All committed runnables return bioperl objects.  
However sometimes we do write specific runnables that may return  
ensembl objects ( in genome annotation)
or other objects that we use for our own data schemas.... not things  
which we are proud of and do not commit yet ... :)

> - My pipeline should perform two blast queries, where the second one  
> gets as input the filtered ouput of the first one.

> How can I filter on the bioperl objects directly without using  
> IO-handling?  Or more general:  How can I pass on the bioperl objects  
> returned from a runnable to the runnable of the next analysis?
>

Ah the IOHandling chaining example describe previously would be the  
way. I will commit some examples this weekend.

cheers,


shawn


> Thanks for your advice.
>
> Joachim
>
>
>
>
>
>
>
> _______________________________________________________________________ 
> _______
> Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail
> und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
>
-shawn

From strach.joachim at web.de  Fri Sep 19 08:00:36 2003
From: strach.joachim at web.de (Joachim H. Strach)
Date: Fri Sep 19 07:58:49 2003
Subject: [Bioperl-pipeline] Plant-Biotech pipeline 3
Message-ID: <200309191200.h8JC0ZQ25487@mailgate5.cinetic.de>

Hi,

after reflecting your answers, I have further questions.
I always refer to the genome_annotation_pipeline because it has a appropiate complexity.

First of all have a look at http://www.cosmoss.org/bm/pics/genome_annotation_pipeline.gif . I visualized the pipeline as I understood it.
Would be helpfull to dynamically visualize the pipeline while creating it with the biopipe_designer, wouldn't it?

1. What ist the meaning of !XXX!?
Is e.g. !INPUT! in iohandler id="2" the output from Monger id="1"?
Does !OUTPUT! refer to the ouput of the current analysis?
What about !INPUTOBJ!, !UNTRANSFORMED_INPUTOBJ!, !ANALYSIS!, ...

2. What's the idea behind the iohandler mapping?

3. The data monger ...
- always setups data for the following analysis?
- although e.g. analysis id="6" has a different iohandler and in my opinion no apperent link to the monger.
- always returns jobs or does it return what the setup_xxx function return?

Back to my pipeline:
4. How would you say is it meant to realize the following:
- storing blast-resulst (SearchIO) in a mysql-BioSql database
- filtering these results
- using the filtered results as input to a new blast (SeqI).
Espacially which functions and packages could I use? I have trouble to
identify suitable functions for my tasks. What is already there, what do I have 
to write by myself?
I did not find a package or function which provides filtering and converting of Bioperl Objects.


Thanks again

Joachim

______________________________________________________________________________
Bestes Testergebnis: Stiftung Warentest Doppelsieg fur WEB.DE FreeMail
und WEB.DE Club. Nur fuer unsere Nutzer! http://f.web.de/?mc=021182

From aym99 at dreamwiz.com  Tue Sep 23 10:52:32 2003
From: aym99 at dreamwiz.com (co.tv domain)
Date: Mon Sep 22 19:51:01 2003
Subject: [Bioperl-pipeline] CO.TV Domain FREE! 
Message-ID: <200309222350.h8MNoiMg030532@portal.open-bio.org>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030923/cc8b77ff/attachment.htm
From 239490 at delphi.com  Tue Sep 23 01:56:41 2003
From: 239490 at delphi.com (239490@delphi.com)
Date: Tue Sep 23 01:53:30 2003
Subject: [Bioperl-pipeline] (no subject)
Message-ID: <200309230553.h8N5rPMg020761@portal.open-bio.org>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030923/73d61eb9/attachment.htm
From danielnkenchor at hotmail.com  Fri Sep 26 15:12:46 2003
From: danielnkenchor at hotmail.com (DR DANIEL NKENCHOR)
Date: Fri Sep 26 07:15:58 2003
Subject: [Bioperl-pipeline] YOUR URGENT RESPONSE IS HIGHLY NEEDED
Message-ID: <200309261115.h8QBFldb019490@portal.open-bio.org>

>From The Desk Of : Mr. Daniel Nkenchor
Director,Int'l Remittance Dept.,
Capital Bank International,
Lagos-Nigeria.
Official Mobile Line : + 234-803-4057041.
 
Dear Sir,
 
I am  Mr. Daniel Nkenchor of Capital Bank International,Lagos-Nigeria. 
With due respect and honour , I humbly wish to introduce to you 
a perfect and confidential business which shall benefit you and I.
There was a building contract which my bank awarded to STRABAG PLC 
in the year [ 2000 ]. This contract was for the construction of 
a fifteen [ 15 ]office buildings for my bank in Abuja [ the Federal Capital ].
Being the Director of the int'l remittance depertment ,I was made 
by the management of my bank to sign the contract agreement with 
STRABAG PLC. Infact, the cost of the building was put to $40Million 
American Dollars. But I saw it as a great opportunity to make some 
money for my family expecially the time my retirement was fast approaching.Hence,I 
purposely over-inflated the cost of this contract to the tune of 
$47Forty Seven Million American Dollars. Although it was only the 
managing director of STRABAG PLC that knew about this secret plan.The 
management of my bank did not have the knowledge of this secret deal.
Now that the contracts have been successfully concluded,the contractors 
duly paid and the projects commissioned , I now want to divert the 
over-inflated sum [ $7M ] to a foreign bank account without the 
knowledge of the management of my bank too. Hence , I decided to 
contact you for a possible PARTNERSHIP so that this over-inflated 
fund [ $7M ] could be secretly transfered into your personal or company's bank account.
It is worthy to state categorically here that this fund was deposited 
into a special/coded account im my bank. So, if you have an account 
where this fund can be transfered into , pls kindly respond without 
any delay so that I will arrange for all the vital documents to 
enable you stand as A SISTER-COMPANY to STRABAG PLC .The documents 
will state that STRABAG PLC mandated my bank [ CAPITAL BANK INT'L 
] to release the sum of $7M to you or your company.It will be clearly 
stated in the documents that STRABAG PLC had originally collected 
the sum of $40M while the balance of $7M is meant to be transfered 
to the SISTER-COMPANY [ that is you or your company ].
Pls if you have a bank account for this purpose , interested and 
capable of handling this project, kindly forward your bank account 
information to me so that ACTIONS will commence imediately. It is 
my humbly wish that this fund is transfered out of my bank within 
one week. I want to assure you that if you can handle this noble 
project for me ,you will get 15% of this fund as your own share.This 
is my promise. I am not a greedy man.
Meanwhile , pls I want you to understand that I have put in many 
years of METICULOUS SERVICE to this bank and would not want my image 
being dented or soiled. Therefore , I appeal to you to handle this 
matter with utmost confidentiality.I can be rached confidentially 
[24 hrs] on the above stated official mobile number. 

Waiting for your quickest response.

 
Yours Sincerely,
 
Mr. Daniel Nkenchor 


From bab at juno.com  Sun Sep 28 11:23:03 2003
From: bab at juno.com (precious)
Date: Sun Sep 28 11:18:50 2003
Subject: [Bioperl-pipeline] Top PC Spy Software
Message-ID: <200309281518.h8SFIhdb020366@portal.open-bio.org>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030928/c9657bd9/attachment.htm