[Bioperl-pipeline] phylip_tree_pipeline

Tue Dec 2 08:50:52 EST 2003

Hi,
Thank a lot for your help, I managed to run my blast pipe!!!!
.
I’m now working with phylip_tree_pipeline.xml . I want to modify your 
pipeline to develop a pipeline for orthologues prediction between 
Arabidopsis thaliana and Oryza sativa (using the pipeline you developed and 
adding to the pipe the SDI and the DoRIO software both written in Java)

The starting files are multifasta files containing  proteique sequences from 
At and Os for each  PFAM family

ie: the PF003514.fa  file contains prot from At/Os  belonging to this PFAM 
family

But, in order to to run the pipeline I want to open a file containing all 
the PFAM in common between Os/At to indicate by a prefix to the pipe where 
the files are

ie the file PFAM_COMMUN.txt look like that

PF00001
PF00003
...

and so on

and for example the multifasta file PF00001 is stored under the directory

PF00001
    fasta_PF00001
    PF00001.fa

So it's probably just a matter of how indicate to biopipe where to find the 
file for each PFAM number contained in  the PFAM_COMMUN.txt file

But we don't know really how to encode in XML (probably with a special I/O 
handler ?)

Thanks in advance

Matthieu CONTE
M. Sc. in Bioinformatics form SIB

00 33 06.68.90.28.70
m_conte at hotmail.com

>From: Shawn Hoon <shawnh at stanford.edu>
>To: "matthieu CONTE" <m_conte at hotmail.com>
>CC: bioperl-pipeline at bioperl.org
>Subject: Re: [Bioperl-pipeline] Still working with biopipe....
>Date: Thu, 27 Nov 2003 10:32:08 -0800
>
>
>On Wednesday, November 26, 2003, at 5:26AM, matthieu CONTE wrote:
>
>>I still have problem with my pipeline blast's program.....
>>
>>"
>>Creating biopipe
>>   Loading Schema...
>>Reading Data_setup xml   : /home/conte/xml/newhope.xml
>>Doing DBAdaptor and IOHandler setup
>>Doing Transformers..
>>Doing Pipeline Flow Setup
>>Doing Analysis..
>>Doing Rules
>>Doing Job Setup...
>>Loading of pipeline biopipe completed
>>2 analysis found.
>>Running test and setup..
>>
>>//////////// Analysis Test ////////////
>>Checking Analysis 1 DataMonger
>>-------------------- WARNING ---------------------
>>MSG: Skipping test for DataMonger
>>---------------------------------------------------
>>ok
>>Checking Analysis 2 Blast ok
>>Fetching Jobs...
>>Fetched 1 incomplete jobs
>>Running job /tmp//6/biopipe_DataMonger.1069852072.635.out 
>>/tmp//6/biopipe_DataMonger.1069852072.635.err
>>"
>>I think there is a problem with the method "get_Seq_by_id"
>>I make a little bioperl program to test the use of the association of the 
>>"get_all_ids" and "get_Seq_by_ids" it just return the ids and not 
>>sequences
>>only the method "seq" give the sequences but I didn't manage to use it in 
>>my XML code!.....
>>
>>
>
>
>>use Bio::DB::Fasta;
>>
>>my $index_file_name='oriz_mfasta.txt';
>>
>>my $inx=Bio::DB::Fasta->new($index_file_name);
>>my @ids=$inx->get_all_ids();
>>foreach (@ids)
>>{
>>my $seq=$inx->seq($_);
>>
>>  print "$_!!!!$seq\n";
>>
>>}
>>
>
>The xml should be doing the following snippet:
>
>>use Bio::DB::Fasta;
>>
>>my $index_file_name='oriz_mfasta.txt';
>>
>>my $inx=Bio::DB::Fasta->new($index_file_name);
>>my @ids=$inx->get_all_ids();
>>foreach (@ids)
>>{
>>my $seq=$inx->get_Seq_by_id($_);
>>
>>  print "$_!!!!$seq->seq\n";
>>
>>}
>
>Since the wrappers take in a Bio::SeqI object.
>  Are you using the biopipe bundle? The example xml in 
>bioperl-pipeline/xml/examples/blast_db_flat.xml
>is for blasting while using Bio::DB::Fasta to fetch input sequences.
>That should work, let me know if it doesn't. All you need to change is the 
>rootdir parameter to point to the right directory. That should work then 
>you can
>modify it accordingly for your purpose..
>
>
>
>shawn
>
>>
>>
>>Matthieu CONTE
>>M. Sc. in Bioinformatics form SIB
>>
>>00 33 06.68.90.28.70
>>m_conte at hotmail.com
>>
>>
>>
>>
>>
>>>From: Shawn Hoon <shawnh at stanford.edu>
>>>To: "matthieu CONTE" <m_conte at hotmail.com>
>>>CC: bioperl-pipeline at bioperl.org
>>>Subject: Re: [Bioperl-pipeline] Still working with biopipe....
>>>Date: Fri, 21 Nov 2003 10:01:22 -0800
>>>
>>>Hi Mattieu,
>>>	Thanks for trying this out.
>>>are you using the bioperl-pipeline bundle from the website? It looks like 
>>>it and that is the correct version to use.
>>>There are some problems with you xml file mainly you need to put in the 
>>>attribute ids for iohandler/analysis etc.
>>>I have attached a corrected version of your xml file.  You should try and 
>>>look at bioperl-pipeline/xml/examples/xml/blast_db_flat.xml
>>>in the examples directory and use that as a template.
>>>
>>>
>>>hope that helps
>>>
>>>shawn
>>><< test.xml >>
>>>
>>>
>>>On Friday, November 21, 2003, at 7:44AM, matthieu CONTE wrote:
>>>
>>>>Hi,
>>>>Still working with biopipe....
>>>>I’m now trying to create de novo a pipeline to find orthologues  between 
>>>>Oryza sativa (Os) and Arabidopsis thaliana (At) by BBMH (best  blast 
>>>>mutal hit) (before to develop something more efficient and more  
>>>>complicate !).
>>>>So I started by a simple blast between a prot from Os to At multifasta  
>>>>prot using the Bio::DB::Fasta and all the bioperl methods needed (and  
>>>>loop on all the Os proteins instead of a massive blast with a chunk of  
>>>>Os proteins). I would like to take a sequence from oriz_mfasta.txt (  
>>>>using the get_Seq_by_id fonction) and blast it against  
>>>>arabido_mfasta.txt and so on for all the seq of oryza.This is the  first 
>>>>step. But, it's not working !!! Probably because it's not really  clear 
>>>>for me the function of all the XML code I am working with  (especially 
>>>>the <datamonger> tag !).
>>>>
>>>>You will find the code and the biopipe output below.
>>>>Thanks in advance
>>>>
>>>>
>>>>
>>>><pipeline_setup>
>>>>
>>>><!-- FILES  -->
>>>><global
>>>>         rootdir="/home/conte/test_blast"
>>>>         datadir="$rootdir/datahope"
>>>>         workdir="$rootdir/blasthope"
>>>>         inputfile="$datadir/oriz_mfasta.txt"
>>>>         blastpath = ""
>>>>         blast_param1="-p blastp -e 1e-5"
>>>>         blastdb1="$datadir/arabido_mfasta.txt"
>>>>         resultdir1="$rootdir/resulthope/analysis1"
>>>>/>
>>>><pipeline_flow_setup>
>>>><!--CALL  MODULES  -->
>>>>  <database_setup>
>>>>    <streamadaptor>
>>>>      <module>Bio::Pipeline::Dumper</module>
>>>>    </streamadaptor>
>>>>    <streamadaptor>
>>>>      <module>Bio::DB::Fasta</module>
>>>>    </streamadaptor>
>>>>   </database_setup>
>>>>
>>>><!-- IOHANDLER PICK UP iDs-->
>>>>     <iohandler_setup>
>>>>    <iohandler>
>>>>     <adaptor_id>2</adaptor_id>
>>>>     <adaptor_type>STREAM</adaptor_type>
>>>>     <iohandler_type>INPUT</iohandler_type>
>>>>     <method>
>>>>       <name>new</name>
>>>>       <rank>1</rank>
>>>>       <argument>
>>>>         <value>$inputfile</value>
>>>>       </argument>
>>>>     </method>
>>>>     <method>
>>>>       <name>get_Seq_by_id</name>
>>>>     <argument>
>>>>     <value>INPUT</value>
>>>>     </argument>
>>>>       <rank>2</rank>
>>>>     </method>
>>>>   </iohandler>
>>>>
>>>>    <iohandler>
>>>>     <adaptor_id>2</adaptor_id>
>>>>     <adaptor_type>STREAM</adaptor_type>
>>>>     <iohandler_type>INPUT</iohandler_type>
>>>>    <method>
>>>>       <name>new</name>
>>>>       <rank>1</rank>
>>>>       <argument>
>>>>           <value>$inputfile</value>
>>>>       </argument>
>>>>    </method>
>>>>    <method>
>>>>       <name>get_all_ids</name>
>>>>       <rank>2</rank>
>>>>    </method>
>>>>   </iohandler>
>>>>
>>>><!-- PARAMETRES OUTPUT (DUMPER) -->
>>>>   <iohandler>
>>>>     <adaptor_id>1</adaptor_id>
>>>>     <adaptor_type>STREAM</adaptor_type>
>>>>     <iohandler_type>OUTPUT</iohandler_type>
>>>>     <method>
>>>>       <name>new</name>
>>>>       <rank>1</rank>
>>>>       <argument>
>>>>         <tag>-dir</tag>
>>>>         <value>$resultdir1</value>
>>>>         SCALAR
>>>>         <rank>1</rank>
>>>>       </argument>
>>>>       <argument>
>>>>         <tag>-module</tag>
>>>>         <value>generic</value>
>>>>         SCALAR
>>>>         <rank>1</rank>
>>>>       </argument>
>>>>       <argument>
>>>>         <tag>-prefix</tag>
>>>>         SCALAR
>>>>         <value>INPUT</value>
>>>>         <rank>2</rank>
>>>>       </argument>
>>>>       <argument>
>>>>         <tag>-format</tag>
>>>>         SCALAR
>>>>         <value>gff</value>
>>>>         <rank>3</rank>
>>>>       </argument>
>>>>       <argument>
>>>>         <tag>-file_suffix</tag>
>>>>         SCALAR
>>>>         <value>gff</value>
>>>>         <rank>4</rank>
>>>>       </argument>
>>>>     </method>
>>>>     <method>
>>>>       <name>dump</name>
>>>>       <rank>2</rank>
>>>>       <argument>
>>>>        <value>OUTPUT</value>
>>>>         ARRAY
>>>>         <rank>1</rank>
>>>>       </argument>
>>>>      </method>
>>>>     </iohandler>
>>>>  </iohandler_setup>
>>>>
>>>><!-- ANALYSIS -->
>>>>    <analysis>
>>>>     <data_monger>
>>>>       <initial></initial>
>>>>       <input>
>>>>         <name>protein_ids</name>
>>>>         <iohandler>1</iohandler>
>>>>       </input>
>>>>       <input_create>
>>>>          <module>setup_initial</module>
>>>>          <rank>1</rank>
>>>>          <argument>
>>>>               <tag>protein_ids</tag>
>>>>               <value>2</value>
>>>>           </argument>
>>>>        </input_create>
>>>></data_monger>
>>>><input_iohandler></input_iohandler>
>>>>   </analysis>
>>>>
>>>><!-- BLAST-->
>>>>   <analysis>
>>>>     <logic_name>Blast</logic_name>
>>>>     <runnable>Bio::Pipeline::Runnable::Blast</runnable>
>>>>     <db>family</db>
>>>>     <db_file>$blastdb1</db_file>
>>>>     <program>blastall</program>
>>>>
>>>><!-- BLASTPATH-->
>>>>     <program_file>$blastpath</program_file>
>>>>     <analysis_parameters>$blast_param1</analysis_parameters>
>>>>     <runnable_parameters>-formatdb 1 -result_dir  
>>>>$resultdir1</runnable_parameters>
>>>>
>>>>     <input_iohandler></input_iohandler>
>>>>
>>>>     <output_iohandler></output_iohandler>
>>>>   </analysis>
>>>>
>>>><!-- RULES -->
>>>><rule>
>>>>     <current_analysis_id>1</current_analysis_id>
>>>>     <next_analysis_id>2</next_analysis_id>
>>>>     NOTHING
>>>>
>>>></rule>
>>>>
>>>></pipeline_flow_setup>
>>>><job_setup>
>>>></job_setup>
>>>>
>>>></pipeline_setup>
>>>>
>>>>
>>>>And I obtain:
>>>>“
>>>>Creating biopipe
>>>>  Loading Schema...
>>>>Reading Data_setup xml   : /home/conte/xml/newhope.xml
>>>>Doing DBAdaptor and IOHandler setup
>>>>Doing Pipeline Flow Setup
>>>>Doing Analysis..
>>>>
>>>>------------- EXCEPTION  -------------
>>>>MSG: Need to store analysis first
>>>>STACK Bio::Pipeline::SQL::JobAdaptor::store  
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/SQL/ 
>>>>JobAdaptor.pm:459
>>>>STACK Bio::Pipeline::XMLImporter::_create_initial_input_and_job  
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/ 
>>>>XMLImporter.pm:837
>>>>STACK Bio::Pipeline::XMLImporter::run  
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/ 
>>>>XMLImporter.pm:484
>>>>STACK toplevel PipelineManager:120
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Matthieu CONTE
>>>>M. Sc. in Bioinformatics form SIB
>>>>CIRAD
>>>>00 33 06.68.90.28.70
>>>>m_conte at hotmail.com
>>>>
>>>>_________________________________________________________________
>>>>MSN Search, le moteur de recherche qui pense comme vous !  
>>>>http://search.msn.fr/worldwide.asp
>>>>
>>>>_______________________________________________
>>>>bioperl-pipeline mailing list
>>>>bioperl-pipeline at bioperl.org
>>>>http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>
>>_________________________________________________________________
>>MSN Messenger : discutez en direct avec vos amis ! 
>>http://www.msn.fr/msger/default.asp
>>
>>_______________________________________________
>>bioperl-pipeline mailing list
>>bioperl-pipeline at bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>
>

_________________________________________________________________
MSN Search, le moteur de recherche qui pense comme vous ! 
http://search.msn.fr/worldwide.asp