From Marc.Logghe at devgen.com Mon Sep 15 03:19:47 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Sep 15 03:18:09 2003 Subject: [Bioperl-pipeline] some newbie questions Message-ID: Hi all, I am brand new to biopipe, so please forgive me if I ask some silly questions. I am currently playing with the idea of implementing the bioperl pipeline and for that I have done some homework by reading a number of biopipe documents. I might have missed a few relevant documents, though ;-) However there is at least one thing that is not yet clear to me. Up to now, we are mirroring a number of databases, like wormbase, and handling it manually. This means, unpacking it, making the chromosomes and wormpep sequences blastable; genomewide blast to map some features in which we are interested; reformatting the database and custom mapping data to gff; import into gbrowse; ... >From the documentation it is pretty clear that the genomewide blast is especially suited for biopipe. But what about all te rest, especially the preparation of the input data ? Also, how can you trigger the pipeline ? I mean, every week wget is fetching new wormbase data, and of course the pipeline shoud only be triggered when new data have arrived. How can you do that ? Can you use biopipe for tasks like installing the new version of acedb ? Many thanks in advance, Marc From strach.joachim at web.de Tue Sep 16 07:50:19 2003 From: strach.joachim at web.de (Joachim H. Strach) Date: Tue Sep 16 07:48:39 2003 Subject: [Bioperl-pipeline] new pipeline Message-ID: <200309161150.h8GBoJQ20948@mailgate5.cinetic.de> Hello, I am a student of computer science and I intern at the working group of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg, University, Faculty of Biology). My task is to build a new pipeline with Biopipe: - runs in local mode - blast FASTA files against a intern database. - blast results again a database. - store results in a MySQL database, so we can easily export outputs to a spreadsheet. Or as a alternativ directly store the results in a self-defined spreadsheet. I did not find an example pipeline, which parses the blast output to a database. Will Biopipe do this for me ? And in general: Which converting of formats is possible with the current Biopipe? Do I have to write new methods in perl also? Many question, I am glad to hear from you. Regards Joachim Strach -- Joachim H. Strach Phone in office from 9.00 am to 5 pm: Germany 0761-203-6988 ______________________________________________________________________________ Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest! Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181 From kumark at cshl.org Tue Sep 16 09:59:42 2003 From: kumark at cshl.org (Kiran Kumar) Date: Tue Sep 16 09:58:11 2003 Subject: [Bioperl-pipeline] new pipeline In-Reply-To: <200309161150.h8GBoJQ20948@mailgate5.cinetic.de> Message-ID: Hi Joachim, Look at the examples in /xml/examples/xml/ blast_biosql_pipeline.xml blast_db_flat.xml The first one reads the inputs to the blast from biosql database and writes the output to biosql. The second one can dump to a gff format file. Kiran On Tue, 16 Sep 2003, Joachim H. Strach wrote: >Hello, > >I am a student of computer science and I intern at the working group of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg, University, Faculty of Biology). > >My task is to build a new pipeline with Biopipe: >- runs in local mode >- blast FASTA files against a intern database. >- blast results again a database. >- store results in a MySQL database, so we can easily export outputs to a spreadsheet. Or as a alternativ directly store the results in a self-defined spreadsheet. > > >I did not find an example pipeline, which parses the blast output to a database. Will Biopipe do this for me ? >And in general: Which converting of formats is possible with the current Biopipe? >Do I have to write new methods in perl also? > >Many question, I am glad to hear from you. > > >Regards >Joachim Strach > > >-- >Joachim H. Strach >Phone in office from 9.00 am to 5 pm: >Germany 0761-203-6988 >______________________________________________________________________________ >Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest! >Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181 > >_______________________________________________ >bioperl-pipeline mailing list >bioperl-pipeline@bioperl.org >http://bioperl.org/mailman/listinfo/bioperl-pipeline > From shawnh at stanford.edu Wed Sep 17 04:11:36 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Wed Sep 17 04:06:34 2003 Subject: [Bioperl-pipeline] new pipeline Message-ID: <8E9255D8-E8E6-11D7-81DC-000A95783436@stanford.edu> > Hello, > > I am a student of computer science and I intern at the working group > of Stefan Rensing (www.plant-biotech.net, Germany, Freiburg, > University, Faculty of Biology). > > My task is to build a new pipeline with Biopipe: > - runs in local mode > - blast FASTA files against a intern database. > - blast results again a database. > - store results in a MySQL database, so we can easily export outputs > to a spreadsheet. Or as a alternativ directly store the results in a > self-defined spreadsheet. > > > I did not find an example pipeline, which parses the blast output to a > database. Will Biopipe do this for me ? > And in general: Which converting of formats is possible with the > current Biopipe? > Do I have to write new methods in perl also? > The format conversion of file formats is not done by Biopipe. Instead, it generalizes by allowing different Input and Output modules( called iohandlers) to be used to interface the inputs and outputs of analysis between databases and files. The most comprehensive set of perl modules that allow these are from bioperl. For example: Bio::SeqIO, Bio::AlignIO, Bio::TreeIO etc, these all may be used by Biopipe via the xml definition of the pipeline. Most of the time you will use this as the analysis modules (called runnables) return mostly bioperl objects.You can also write to mySQL if you have the appropriate adaptor modules that are able to take the output objects and calling the appropriate sql to store the data. If you have your own schema, you must write your own adaptor See the examples mentioned by kiran for examples. cheers, shawn > Many question, I am glad to hear from you. > > > Regards > Joachim Strach > > > -- Joachim H. Strach > Phone in office from 9.00 am to 5 pm: > Germany 0761-203-6988 > _______________________________________________________________________ > _______ > Doppelsieg fur WEB.DE FreeMail und Club bei Stiftung Warentest! > Beste Sicherheitswertung und Verfuegbarkeit! http://f.web.de/?mc=021181 > > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > > From shawnh at stanford.edu Wed Sep 17 04:12:55 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Wed Sep 17 04:07:59 2003 Subject: [Bioperl-pipeline] some newbie questions Message-ID: On Monday, September 15, 2003, at 12:19 AM, Marc Logghe wrote: > Hi all, > I am brand new to biopipe, so please forgive me if I ask some silly > questions. > I am currently playing with the idea of implementing the bioperl > pipeline and for that I have done some homework by reading a number of > biopipe documents. I might have missed a few relevant documents, > though ;-) > Ah, I'm writing some of them probably. Documentation may come sooneror later, depending on how soon I settle into school. > However there is at least one thing that is not yet clear to me. Up to > now, we are mirroring a number of databases, like wormbase, and > handling it manually. This means, unpacking it, making the chromosomes > and wormpep sequences blastable; genomewide blast to map some features > in which we are interested; reformatting the database and custom > mapping data to gff; import into gbrowse; ... For data preparation, there is some but it maybe limited. One should be able to roll out your own and plug it in. These would come under InputCreates. In Bio::Pipeline::InputCreate::* modules are responsible for various means to setup the inputs and jobs to the pipeline. For example a module that does file based blasting of sequences called setup_file_blast will a) given a file of input sequences in any format, split the file into a specified number of chunks. b) create a blast job in the pipeline for each chunk c) create the specified working directory for storing the output files d) format the db file for blasting if you are blasting against itself if the option is specified see bioperl-pipeline/xml/examples/xml/blast_file_pipeline.xml If say you want to have the blast output stored as gff files, then u can specify a data dumper as an output iohandler, see bioperl-pipeline/xml/examples/xml/blast_db_flat.xml which uses Bio::Pipeline::Utils::Dumper Alternatively if you want, you can probably use Bio::DB:GFF as an output handler to take the blast features and store in directly in to the database using the Seqfeature gff_string method. Any customization you will want to do you should probably roll your module which you can plug in as an output iohandler. >> From the documentation it is pretty clear that the genomewide blast >> is especially suited for biopipe. > But what about all te rest, especially the preparation of the input > data ? Also, how can you trigger the pipeline ? I mean, every week > wget is fetching new wormbase data, and of course the pipeline shoud > only be triggered when new data have arrived. How can you do that ? Right now, the best bet would be to write some pipeline that reads new sequences from some directory or file to load sequencing into a db or treat as a file and carry out the blast. See blast_file_pipeline.xml or blast_db_flat.xml for similar example. This would be triggered by some kinda of cron job that checks the last modification time of the data file. Nothing for this is currently written so you are welcome to give it a shot. > Can you use biopipe for tasks like installing the new version of acedb > ? > I have no knowledge of installing acedb and biopipe cannot do this so I can't say much. Biopipe is more suited for task where you wanna parallelize multiple jobs or have some kinda of workflow that you want to execute in a certain order. So it must be quite complex to setup acedb if you need a pipeline to do so? cheers, shawn -shawn From strach.joachim at web.de Wed Sep 17 10:28:58 2003 From: strach.joachim at web.de (Joachim H. Strach) Date: Wed Sep 17 10:27:14 2003 Subject: [Bioperl-pipeline] Plant-Biotech pipeline Message-ID: <200309171428.h8HESwQ28519@mailgate5.cinetic.de> Hello, first of all thanks for your previous anwsers, they helped a lot for my understanding of the Biopipe workflow. Some more question arised ... . I would be glad if either you could tell me to find the suitable documentation or give me some more answers. I took a closer look to genome_annotation_pipeline.xml: - What are the tags , , good for? - What is the function of the ? - At the rule section in : where is e.g. the "COPY_ID" related to? - Shawn, why did you say "... return mostly bioperl objects". Which runnables do not and what do they return? - My pipeline should perform two blast queries, where the second one gets as input the filtered ouput of the first one. How can I filter on the bioperl objects directly without using IO-handling? Or more general: How can I pass on the bioperl objects returned from a runnable to the runnable of the next analysis? Thanks for your advice. Joachim ______________________________________________________________________________ Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183 From shawnh at stanford.edu Wed Sep 17 11:44:21 2003 From: shawnh at stanford.edu (Shawn Hoon) Date: Wed Sep 17 11:39:27 2003 Subject: [Bioperl-pipeline] Plant-Biotech pipeline In-Reply-To: <200309171428.h8HESwQ28519@mailgate5.cinetic.de> References: <200309171428.h8HESwQ28519@mailgate5.cinetic.de> Message-ID: On Wednesday, September 17, 2003, at 7:28 AM, Joachim H. Strach wrote: > Hello, > > first of all thanks for your previous anwsers, they helped a lot for > my understanding of the Biopipe workflow. > > Some more question arised ... . > I would be glad if either you could tell me to find the suitable > documentation or give me some more answers. > > I took a closer look to genome_annotation_pipeline.xml: > - What are the tags , , good for? transformer defines modules that may be used to operate on data input and output before they are passed to iohandlers. This includes filtering or other data transformation operations. Input transformers are applied after fetching from iohandlers, while output transformers are applied before storing to db. For example in a 2 stage blast which writes the result to the database both times and applying filters when fetching the input and writing output, the flow is : store to db / filter trransformer |filter transformer} / \ / Input- ----> blast 1/ (analysis 1) \ -> blast 2 (analysis 2) \ / \ filter transformer \ |filter transformer| \ store to db Previously biopipe requires that all results should be written to a database before being fetched again for the next analysis. This is so that if say the analysis 2 fails, one would not need to rerun the first. Now, sometimes the first analysis may be some simple operation that runs fast, and we don't want to bother with storing her results. So I have recently committed some relatively new code for doing iohandler chaining. the flow for the same analysis is sightly different: filter transformer filter transformer / \ / \ Input- ----> blast 1 blast2 \ filter transformer| \ store to db So for this case if blast2 fails, we gotta go back and rerun blast1. I don't think I have committed the xml for this. Will do so, when I get back from a dept retreat this week. see mail: http://bioperl.org/pipermail//bioperl-pipeline/2003-August/000387.html > - What is the function of the ? this is to a 'special' runnable that is used to set up analysis. Say you want to align cdnas to genomes. What you may want to do is to run est2genome of the cdna on the section of the genome where it hits found via blast. You wouldn't want to pass the entire chromosome for example to est2genome. Instead you may need to figure out the region where the blast , do some padding and pass the slice of the genome together with the cdna to the next analysis. So you would figure out the hit region, pass the start,end strand of the coordinates to the est2genome input iohandler . So we would plug into the DataMonger, a Bio::Pipeline::InputCreate which contains various 'hacky' modules that setup jobs very specifically as to how your analysis requires the inputs. This is to reconcile how a lot of times, the database adaptor modules do not return what you want to feed directly into an analysis. > - At the rule section in : where is e.g. the "COPY_ID" related > to? > Once a job is finished, the PipelineManager will look up what it should do next with regards to this job. for COPY_ID it will reuse the same input id for the next analysis but may map the input iohandler to a new one for example: RepeatMasker->Blast both use the same input say sequence_1 but the fetching of sequence for blast (via ensembl) would use a fetch_repeatmasked_seq while RepeatMasker would fetch unmasked seq as its input. So there is a reuse of the input id and change of the input iohandler see bioperl-pipeline/xml/README > - Shawn, why did you say "... return mostly bioperl objects". Which > runnables do not and what do they return? Uhm, okay you got me. All committed runnables return bioperl objects. However sometimes we do write specific runnables that may return ensembl objects ( in genome annotation) or other objects that we use for our own data schemas.... not things which we are proud of and do not commit yet ... :) > - My pipeline should perform two blast queries, where the second one > gets as input the filtered ouput of the first one. > How can I filter on the bioperl objects directly without using > IO-handling? Or more general: How can I pass on the bioperl objects > returned from a runnable to the runnable of the next analysis? > Ah the IOHandling chaining example describe previously would be the way. I will commit some examples this weekend. cheers, shawn > Thanks for your advice. > > Joachim > > > > > > > > _______________________________________________________________________ > _______ > Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail > und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183 > > _______________________________________________ > bioperl-pipeline mailing list > bioperl-pipeline@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-pipeline > > -shawn From strach.joachim at web.de Fri Sep 19 08:00:36 2003 From: strach.joachim at web.de (Joachim H. Strach) Date: Fri Sep 19 07:58:49 2003 Subject: [Bioperl-pipeline] Plant-Biotech pipeline 3 Message-ID: <200309191200.h8JC0ZQ25487@mailgate5.cinetic.de> Hi, after reflecting your answers, I have further questions. I always refer to the genome_annotation_pipeline because it has a appropiate complexity. First of all have a look at http://www.cosmoss.org/bm/pics/genome_annotation_pipeline.gif . I visualized the pipeline as I understood it. Would be helpfull to dynamically visualize the pipeline while creating it with the biopipe_designer, wouldn't it? 1. What ist the meaning of !XXX!? Is e.g. !INPUT! in iohandler id="2" the output from Monger id="1"? Does !OUTPUT! refer to the ouput of the current analysis? What about !INPUTOBJ!, !UNTRANSFORMED_INPUTOBJ!, !ANALYSIS!, ... 2. What's the idea behind the iohandler mapping? 3. The data monger ... - always setups data for the following analysis? - although e.g. analysis id="6" has a different iohandler and in my opinion no apperent link to the monger. - always returns jobs or does it return what the setup_xxx function return? Back to my pipeline: 4. How would you say is it meant to realize the following: - storing blast-resulst (SearchIO) in a mysql-BioSql database - filtering these results - using the filtered results as input to a new blast (SeqI). Espacially which functions and packages could I use? I have trouble to identify suitable functions for my tasks. What is already there, what do I have to write by myself? I did not find a package or function which provides filtering and converting of Bioperl Objects. Thanks again Joachim ______________________________________________________________________________ Bestes Testergebnis: Stiftung Warentest Doppelsieg fur WEB.DE FreeMail und WEB.DE Club. Nur fuer unsere Nutzer! http://f.web.de/?mc=021182 From aym99 at dreamwiz.com Tue Sep 23 10:52:32 2003 From: aym99 at dreamwiz.com (co.tv domain) Date: Mon Sep 22 19:51:01 2003 Subject: [Bioperl-pipeline] CO.TV Domain FREE! Message-ID: <200309222350.h8MNoiMg030532@portal.open-bio.org> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030923/cc8b77ff/attachment.htm From 239490 at delphi.com Tue Sep 23 01:56:41 2003 From: 239490 at delphi.com (239490@delphi.com) Date: Tue Sep 23 01:53:30 2003 Subject: [Bioperl-pipeline] (no subject) Message-ID: <200309230553.h8N5rPMg020761@portal.open-bio.org> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030923/73d61eb9/attachment.htm From danielnkenchor at hotmail.com Fri Sep 26 15:12:46 2003 From: danielnkenchor at hotmail.com (DR DANIEL NKENCHOR) Date: Fri Sep 26 07:15:58 2003 Subject: [Bioperl-pipeline] YOUR URGENT RESPONSE IS HIGHLY NEEDED Message-ID: <200309261115.h8QBFldb019490@portal.open-bio.org> >From The Desk Of : Mr. Daniel Nkenchor Director,Int'l Remittance Dept., Capital Bank International, Lagos-Nigeria. Official Mobile Line : + 234-803-4057041. Dear Sir, I am Mr. Daniel Nkenchor of Capital Bank International,Lagos-Nigeria. With due respect and honour , I humbly wish to introduce to you a perfect and confidential business which shall benefit you and I. There was a building contract which my bank awarded to STRABAG PLC in the year [ 2000 ]. This contract was for the construction of a fifteen [ 15 ]office buildings for my bank in Abuja [ the Federal Capital ]. Being the Director of the int'l remittance depertment ,I was made by the management of my bank to sign the contract agreement with STRABAG PLC. Infact, the cost of the building was put to $40Million American Dollars. But I saw it as a great opportunity to make some money for my family expecially the time my retirement was fast approaching.Hence,I purposely over-inflated the cost of this contract to the tune of $47Forty Seven Million American Dollars. Although it was only the managing director of STRABAG PLC that knew about this secret plan.The management of my bank did not have the knowledge of this secret deal. Now that the contracts have been successfully concluded,the contractors duly paid and the projects commissioned , I now want to divert the over-inflated sum [ $7M ] to a foreign bank account without the knowledge of the management of my bank too. Hence , I decided to contact you for a possible PARTNERSHIP so that this over-inflated fund [ $7M ] could be secretly transfered into your personal or company's bank account. It is worthy to state categorically here that this fund was deposited into a special/coded account im my bank. So, if you have an account where this fund can be transfered into , pls kindly respond without any delay so that I will arrange for all the vital documents to enable you stand as A SISTER-COMPANY to STRABAG PLC .The documents will state that STRABAG PLC mandated my bank [ CAPITAL BANK INT'L ] to release the sum of $7M to you or your company.It will be clearly stated in the documents that STRABAG PLC had originally collected the sum of $40M while the balance of $7M is meant to be transfered to the SISTER-COMPANY [ that is you or your company ]. Pls if you have a bank account for this purpose , interested and capable of handling this project, kindly forward your bank account information to me so that ACTIONS will commence imediately. It is my humbly wish that this fund is transfered out of my bank within one week. I want to assure you that if you can handle this noble project for me ,you will get 15% of this fund as your own share.This is my promise. I am not a greedy man. Meanwhile , pls I want you to understand that I have put in many years of METICULOUS SERVICE to this bank and would not want my image being dented or soiled. Therefore , I appeal to you to handle this matter with utmost confidentiality.I can be rached confidentially [24 hrs] on the above stated official mobile number. Waiting for your quickest response. Yours Sincerely, Mr. Daniel Nkenchor From bab at juno.com Sun Sep 28 11:23:03 2003 From: bab at juno.com (precious) Date: Sun Sep 28 11:18:50 2003 Subject: [Bioperl-pipeline] Top PC Spy Software Message-ID: <200309281518.h8SFIhdb020366@portal.open-bio.org> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-pipeline/attachments/20030928/c9657bd9/attachment.htm