[Bioperl-pipeline] Still working with biopipe....

Thu Nov 27 13:32:08 EST 2003

On Wednesday, November 26, 2003, at 5:26AM, matthieu CONTE wrote:

> I still have problem with my pipeline blast's program.....
>
> "
> Creating biopipe
>   Loading Schema...
> Reading Data_setup xml   : /home/conte/xml/newhope.xml
> Doing DBAdaptor and IOHandler setup
> Doing Transformers..
> Doing Pipeline Flow Setup
> Doing Analysis..
> Doing Rules
> Doing Job Setup...
> Loading of pipeline biopipe completed
> 2 analysis found.
> Running test and setup..
>
> //////////// Analysis Test ////////////
> Checking Analysis 1 DataMonger
> -------------------- WARNING ---------------------
> MSG: Skipping test for DataMonger
> ---------------------------------------------------
> ok
> Checking Analysis 2 Blast ok
> Fetching Jobs...
> Fetched 1 incomplete jobs
> Running job /tmp//6/biopipe_DataMonger.1069852072.635.out 
> /tmp//6/biopipe_DataMonger.1069852072.635.err
> "
> I think there is a problem with the method "get_Seq_by_id"
> I make a little bioperl program to test the use of the association of 
> the "get_all_ids" and "get_Seq_by_ids" it just return the ids and not 
> sequences
> only the method "seq" give the sequences but I didn't manage to use it 
> in my XML code!.....
>
>

> use Bio::DB::Fasta;
>
> my $index_file_name='oriz_mfasta.txt';
>
> my $inx=Bio::DB::Fasta->new($index_file_name);
> my @ids=$inx->get_all_ids();
> foreach (@ids)
> {
> my $seq=$inx->seq($_);
>
>  print "$_!!!!$seq\n";
>
> }
>

The xml should be doing the following snippet:

> use Bio::DB::Fasta;
>
> my $index_file_name='oriz_mfasta.txt';
>
> my $inx=Bio::DB::Fasta->new($index_file_name);
> my @ids=$inx->get_all_ids();
> foreach (@ids)
> {
> my $seq=$inx->get_Seq_by_id($_);
>
>  print "$_!!!!$seq->seq\n";
>
> }

Since the wrappers take in a Bio::SeqI object.
  Are you using the biopipe bundle? The example xml in 
bioperl-pipeline/xml/examples/blast_db_flat.xml
is for blasting while using Bio::DB::Fasta to fetch input sequences.
That should work, let me know if it doesn't. All you need to change is 
the rootdir parameter to point to the right directory. That should work 
then you can
modify it accordingly for your purpose..	

shawn

>
>
> Matthieu CONTE
> M. Sc. in Bioinformatics form SIB
>
> 00 33 06.68.90.28.70
> m_conte at hotmail.com
>
>
>
>
>
>> From: Shawn Hoon <shawnh at stanford.edu>
>> To: "matthieu CONTE" <m_conte at hotmail.com>
>> CC: bioperl-pipeline at bioperl.org
>> Subject: Re: [Bioperl-pipeline] Still working with biopipe....
>> Date: Fri, 21 Nov 2003 10:01:22 -0800
>>
>> Hi Mattieu,
>> 	Thanks for trying this out.
>> are you using the bioperl-pipeline bundle from the website? It looks 
>> like it and that is the correct version to use.
>> There are some problems with you xml file mainly you need to put in 
>> the attribute ids for iohandler/analysis etc.
>> I have attached a corrected version of your xml file.  You should try 
>> and look at bioperl-pipeline/xml/examples/xml/blast_db_flat.xml
>> in the examples directory and use that as a template.
>>
>>
>> hope that helps
>>
>> shawn
>> << test.xml >>
>>
>>
>> On Friday, November 21, 2003, at 7:44AM, matthieu CONTE wrote:
>>
>>> Hi,
>>> Still working with biopipe....
>>> I’m now trying to create de novo a pipeline to find orthologues  
>>> between Oryza sativa (Os) and Arabidopsis thaliana (At) by BBMH 
>>> (best  blast mutal hit) (before to develop something more efficient 
>>> and more  complicate !).
>>> So I started by a simple blast between a prot from Os to At 
>>> multifasta  prot using the Bio::DB::Fasta and all the bioperl 
>>> methods needed (and  loop on all the Os proteins instead of a 
>>> massive blast with a chunk of  Os proteins). I would like to take a 
>>> sequence from oriz_mfasta.txt (  using the get_Seq_by_id fonction) 
>>> and blast it against  arabido_mfasta.txt and so on for all the seq 
>>> of oryza.This is the  first step. But, it's not working !!! Probably 
>>> because it's not really  clear for me the function of all the XML 
>>> code I am working with  (especially the <datamonger> tag !).
>>>
>>> You will find the code and the biopipe output below.
>>> Thanks in advance
>>>
>>>
>>>
>>> <pipeline_setup>
>>>
>>> <!-- FILES  -->
>>> <global
>>>         rootdir="/home/conte/test_blast"
>>>         datadir="$rootdir/datahope"
>>>         workdir="$rootdir/blasthope"
>>>         inputfile="$datadir/oriz_mfasta.txt"
>>>         blastpath = ""
>>>         blast_param1="-p blastp -e 1e-5"
>>>         blastdb1="$datadir/arabido_mfasta.txt"
>>>         resultdir1="$rootdir/resulthope/analysis1"
>>> />
>>> <pipeline_flow_setup>
>>> <!--CALL  MODULES  -->
>>>  <database_setup>
>>>    <streamadaptor>
>>>      <module>Bio::Pipeline::Dumper</module>
>>>    </streamadaptor>
>>>    <streamadaptor>
>>>      <module>Bio::DB::Fasta</module>
>>>    </streamadaptor>
>>>   </database_setup>
>>>
>>> <!-- IOHANDLER PICK UP iDs-->
>>>     <iohandler_setup>
>>>    <iohandler>
>>>     <adaptor_id>2</adaptor_id>
>>>     <adaptor_type>STREAM</adaptor_type>
>>>     <iohandler_type>INPUT</iohandler_type>
>>>     <method>
>>>       <name>new</name>
>>>       <rank>1</rank>
>>>       <argument>
>>>         <value>$inputfile</value>
>>>       </argument>
>>>     </method>
>>>     <method>
>>>       <name>get_Seq_by_id</name>
>>>     <argument>
>>>     <value>INPUT</value>
>>>     </argument>
>>>       <rank>2</rank>
>>>     </method>
>>>   </iohandler>
>>>
>>>    <iohandler>
>>>     <adaptor_id>2</adaptor_id>
>>>     <adaptor_type>STREAM</adaptor_type>
>>>     <iohandler_type>INPUT</iohandler_type>
>>>    <method>
>>>       <name>new</name>
>>>       <rank>1</rank>
>>>       <argument>
>>>           <value>$inputfile</value>
>>>       </argument>
>>>    </method>
>>>    <method>
>>>       <name>get_all_ids</name>
>>>       <rank>2</rank>
>>>    </method>
>>>   </iohandler>
>>>
>>> <!-- PARAMETRES OUTPUT (DUMPER) -->
>>>   <iohandler>
>>>     <adaptor_id>1</adaptor_id>
>>>     <adaptor_type>STREAM</adaptor_type>
>>>     <iohandler_type>OUTPUT</iohandler_type>
>>>     <method>
>>>       <name>new</name>
>>>       <rank>1</rank>
>>>       <argument>
>>>         <tag>-dir</tag>
>>>         <value>$resultdir1</value>
>>>         SCALAR
>>>         <rank>1</rank>
>>>       </argument>
>>>       <argument>
>>>         <tag>-module</tag>
>>>         <value>generic</value>
>>>         SCALAR
>>>         <rank>1</rank>
>>>       </argument>
>>>       <argument>
>>>         <tag>-prefix</tag>
>>>         SCALAR
>>>         <value>INPUT</value>
>>>         <rank>2</rank>
>>>       </argument>
>>>       <argument>
>>>         <tag>-format</tag>
>>>         SCALAR
>>>         <value>gff</value>
>>>         <rank>3</rank>
>>>       </argument>
>>>       <argument>
>>>         <tag>-file_suffix</tag>
>>>         SCALAR
>>>         <value>gff</value>
>>>         <rank>4</rank>
>>>       </argument>
>>>     </method>
>>>     <method>
>>>       <name>dump</name>
>>>       <rank>2</rank>
>>>       <argument>
>>>        <value>OUTPUT</value>
>>>         ARRAY
>>>         <rank>1</rank>
>>>       </argument>
>>>      </method>
>>>     </iohandler>
>>>  </iohandler_setup>
>>>
>>> <!-- ANALYSIS -->
>>>    <analysis>
>>>     <data_monger>
>>>       <initial></initial>
>>>       <input>
>>>         <name>protein_ids</name>
>>>         <iohandler>1</iohandler>
>>>       </input>
>>>       <input_create>
>>>          <module>setup_initial</module>
>>>          <rank>1</rank>
>>>          <argument>
>>>               <tag>protein_ids</tag>
>>>               <value>2</value>
>>>           </argument>
>>>        </input_create>
>>> </data_monger>
>>> <input_iohandler></input_iohandler>
>>>   </analysis>
>>>
>>> <!-- BLAST-->
>>>   <analysis>
>>>     <logic_name>Blast</logic_name>
>>>     <runnable>Bio::Pipeline::Runnable::Blast</runnable>
>>>     <db>family</db>
>>>     <db_file>$blastdb1</db_file>
>>>     <program>blastall</program>
>>>
>>> <!-- BLASTPATH-->
>>>     <program_file>$blastpath</program_file>
>>>     <analysis_parameters>$blast_param1</analysis_parameters>
>>>     <runnable_parameters>-formatdb 1 -result_dir  
>>> $resultdir1</runnable_parameters>
>>>
>>>     <input_iohandler></input_iohandler>
>>>
>>>     <output_iohandler></output_iohandler>
>>>   </analysis>
>>>
>>> <!-- RULES -->
>>> <rule>
>>>     <current_analysis_id>1</current_analysis_id>
>>>     <next_analysis_id>2</next_analysis_id>
>>>     NOTHING
>>>
>>> </rule>
>>>
>>> </pipeline_flow_setup>
>>> <job_setup>
>>> </job_setup>
>>>
>>> </pipeline_setup>
>>>
>>>
>>> And I obtain:
>>> “
>>> Creating biopipe
>>>  Loading Schema...
>>> Reading Data_setup xml   : /home/conte/xml/newhope.xml
>>> Doing DBAdaptor and IOHandler setup
>>> Doing Pipeline Flow Setup
>>> Doing Analysis..
>>>
>>> ------------- EXCEPTION  -------------
>>> MSG: Need to store analysis first
>>> STACK Bio::Pipeline::SQL::JobAdaptor::store  
>>> /usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/SQL/ 
>>> JobAdaptor.pm:459
>>> STACK Bio::Pipeline::XMLImporter::_create_initial_input_and_job  
>>> /usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/ 
>>> XMLImporter.pm:837
>>> STACK Bio::Pipeline::XMLImporter::run  
>>> /usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/ 
>>> XMLImporter.pm:484
>>> STACK toplevel PipelineManager:120
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Matthieu CONTE
>>> M. Sc. in Bioinformatics form SIB
>>> CIRAD
>>> 00 33 06.68.90.28.70
>>> m_conte at hotmail.com
>>>
>>> _________________________________________________________________
>>> MSN Search, le moteur de recherche qui pense comme vous !  
>>> http://search.msn.fr/worldwide.asp
>>>
>>> _______________________________________________
>>> bioperl-pipeline mailing list
>>> bioperl-pipeline at bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
> _________________________________________________________________
> MSN Messenger : discutez en direct avec vos amis ! 
> http://www.msn.fr/msger/default.asp
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>