[Bioperl-pipeline] phylip_tree_pipeline
matthieu CONTE
m_conte at hotmail.com
Tue Dec 2 08:50:52 EST 2003
Hi,
Thank a lot for your help, I managed to run my blast pipe!!!!
.
Im now working with phylip_tree_pipeline.xml . I want to modify your
pipeline to develop a pipeline for orthologues prediction between
Arabidopsis thaliana and Oryza sativa (using the pipeline you developed and
adding to the pipe the SDI and the DoRIO software both written in Java)
The starting files are multifasta files containing proteique sequences from
At and Os for each PFAM family
ie: the PF003514.fa file contains prot from At/Os belonging to this PFAM
family
But, in order to to run the pipeline I want to open a file containing all
the PFAM in common between Os/At to indicate by a prefix to the pipe where
the files are
ie the file PFAM_COMMUN.txt look like that
PF00001
PF00003
...
and so on
and for example the multifasta file PF00001 is stored under the directory
PF00001
fasta_PF00001
PF00001.fa
So it's probably just a matter of how indicate to biopipe where to find the
file for each PFAM number contained in the PFAM_COMMUN.txt file
But we don't know really how to encode in XML (probably with a special I/O
handler ?)
Thanks in advance
Matthieu CONTE
M. Sc. in Bioinformatics form SIB
00 33 06.68.90.28.70
m_conte at hotmail.com
>From: Shawn Hoon <shawnh at stanford.edu>
>To: "matthieu CONTE" <m_conte at hotmail.com>
>CC: bioperl-pipeline at bioperl.org
>Subject: Re: [Bioperl-pipeline] Still working with biopipe....
>Date: Thu, 27 Nov 2003 10:32:08 -0800
>
>
>On Wednesday, November 26, 2003, at 5:26AM, matthieu CONTE wrote:
>
>>I still have problem with my pipeline blast's program.....
>>
>>"
>>Creating biopipe
>> Loading Schema...
>>Reading Data_setup xml : /home/conte/xml/newhope.xml
>>Doing DBAdaptor and IOHandler setup
>>Doing Transformers..
>>Doing Pipeline Flow Setup
>>Doing Analysis..
>>Doing Rules
>>Doing Job Setup...
>>Loading of pipeline biopipe completed
>>2 analysis found.
>>Running test and setup..
>>
>>//////////// Analysis Test ////////////
>>Checking Analysis 1 DataMonger
>>-------------------- WARNING ---------------------
>>MSG: Skipping test for DataMonger
>>---------------------------------------------------
>>ok
>>Checking Analysis 2 Blast ok
>>Fetching Jobs...
>>Fetched 1 incomplete jobs
>>Running job /tmp//6/biopipe_DataMonger.1069852072.635.out
>>/tmp//6/biopipe_DataMonger.1069852072.635.err
>>"
>>I think there is a problem with the method "get_Seq_by_id"
>>I make a little bioperl program to test the use of the association of the
>>"get_all_ids" and "get_Seq_by_ids" it just return the ids and not
>>sequences
>>only the method "seq" give the sequences but I didn't manage to use it in
>>my XML code!.....
>>
>>
>
>
>>use Bio::DB::Fasta;
>>
>>my $index_file_name='oriz_mfasta.txt';
>>
>>my $inx=Bio::DB::Fasta->new($index_file_name);
>>my @ids=$inx->get_all_ids();
>>foreach (@ids)
>>{
>>my $seq=$inx->seq($_);
>>
>> print "$_!!!!$seq\n";
>>
>>}
>>
>
>The xml should be doing the following snippet:
>
>>use Bio::DB::Fasta;
>>
>>my $index_file_name='oriz_mfasta.txt';
>>
>>my $inx=Bio::DB::Fasta->new($index_file_name);
>>my @ids=$inx->get_all_ids();
>>foreach (@ids)
>>{
>>my $seq=$inx->get_Seq_by_id($_);
>>
>> print "$_!!!!$seq->seq\n";
>>
>>}
>
>Since the wrappers take in a Bio::SeqI object.
> Are you using the biopipe bundle? The example xml in
>bioperl-pipeline/xml/examples/blast_db_flat.xml
>is for blasting while using Bio::DB::Fasta to fetch input sequences.
>That should work, let me know if it doesn't. All you need to change is the
>rootdir parameter to point to the right directory. That should work then
>you can
>modify it accordingly for your purpose..
>
>
>
>shawn
>
>>
>>
>>Matthieu CONTE
>>M. Sc. in Bioinformatics form SIB
>>
>>00 33 06.68.90.28.70
>>m_conte at hotmail.com
>>
>>
>>
>>
>>
>>>From: Shawn Hoon <shawnh at stanford.edu>
>>>To: "matthieu CONTE" <m_conte at hotmail.com>
>>>CC: bioperl-pipeline at bioperl.org
>>>Subject: Re: [Bioperl-pipeline] Still working with biopipe....
>>>Date: Fri, 21 Nov 2003 10:01:22 -0800
>>>
>>>Hi Mattieu,
>>> Thanks for trying this out.
>>>are you using the bioperl-pipeline bundle from the website? It looks like
>>>it and that is the correct version to use.
>>>There are some problems with you xml file mainly you need to put in the
>>>attribute ids for iohandler/analysis etc.
>>>I have attached a corrected version of your xml file. You should try and
>>>look at bioperl-pipeline/xml/examples/xml/blast_db_flat.xml
>>>in the examples directory and use that as a template.
>>>
>>>
>>>hope that helps
>>>
>>>shawn
>>><< test.xml >>
>>>
>>>
>>>On Friday, November 21, 2003, at 7:44AM, matthieu CONTE wrote:
>>>
>>>>Hi,
>>>>Still working with biopipe....
>>>>Im now trying to create de novo a pipeline to find orthologues between
>>>>Oryza sativa (Os) and Arabidopsis thaliana (At) by BBMH (best blast
>>>>mutal hit) (before to develop something more efficient and more
>>>>complicate !).
>>>>So I started by a simple blast between a prot from Os to At multifasta
>>>>prot using the Bio::DB::Fasta and all the bioperl methods needed (and
>>>>loop on all the Os proteins instead of a massive blast with a chunk of
>>>>Os proteins). I would like to take a sequence from oriz_mfasta.txt (
>>>>using the get_Seq_by_id fonction) and blast it against
>>>>arabido_mfasta.txt and so on for all the seq of oryza.This is the first
>>>>step. But, it's not working !!! Probably because it's not really clear
>>>>for me the function of all the XML code I am working with (especially
>>>>the <datamonger> tag !).
>>>>
>>>>You will find the code and the biopipe output below.
>>>>Thanks in advance
>>>>
>>>>
>>>>
>>>><pipeline_setup>
>>>>
>>>><!-- FILES -->
>>>><global
>>>> rootdir="/home/conte/test_blast"
>>>> datadir="$rootdir/datahope"
>>>> workdir="$rootdir/blasthope"
>>>> inputfile="$datadir/oriz_mfasta.txt"
>>>> blastpath = ""
>>>> blast_param1="-p blastp -e 1e-5"
>>>> blastdb1="$datadir/arabido_mfasta.txt"
>>>> resultdir1="$rootdir/resulthope/analysis1"
>>>>/>
>>>><pipeline_flow_setup>
>>>><!--CALL MODULES -->
>>>> <database_setup>
>>>> <streamadaptor>
>>>> <module>Bio::Pipeline::Dumper</module>
>>>> </streamadaptor>
>>>> <streamadaptor>
>>>> <module>Bio::DB::Fasta</module>
>>>> </streamadaptor>
>>>> </database_setup>
>>>>
>>>><!-- IOHANDLER PICK UP iDs-->
>>>> <iohandler_setup>
>>>> <iohandler>
>>>> <adaptor_id>2</adaptor_id>
>>>> <adaptor_type>STREAM</adaptor_type>
>>>> <iohandler_type>INPUT</iohandler_type>
>>>> <method>
>>>> <name>new</name>
>>>> <rank>1</rank>
>>>> <argument>
>>>> <value>$inputfile</value>
>>>> </argument>
>>>> </method>
>>>> <method>
>>>> <name>get_Seq_by_id</name>
>>>> <argument>
>>>> <value>INPUT</value>
>>>> </argument>
>>>> <rank>2</rank>
>>>> </method>
>>>> </iohandler>
>>>>
>>>> <iohandler>
>>>> <adaptor_id>2</adaptor_id>
>>>> <adaptor_type>STREAM</adaptor_type>
>>>> <iohandler_type>INPUT</iohandler_type>
>>>> <method>
>>>> <name>new</name>
>>>> <rank>1</rank>
>>>> <argument>
>>>> <value>$inputfile</value>
>>>> </argument>
>>>> </method>
>>>> <method>
>>>> <name>get_all_ids</name>
>>>> <rank>2</rank>
>>>> </method>
>>>> </iohandler>
>>>>
>>>><!-- PARAMETRES OUTPUT (DUMPER) -->
>>>> <iohandler>
>>>> <adaptor_id>1</adaptor_id>
>>>> <adaptor_type>STREAM</adaptor_type>
>>>> <iohandler_type>OUTPUT</iohandler_type>
>>>> <method>
>>>> <name>new</name>
>>>> <rank>1</rank>
>>>> <argument>
>>>> <tag>-dir</tag>
>>>> <value>$resultdir1</value>
>>>> SCALAR
>>>> <rank>1</rank>
>>>> </argument>
>>>> <argument>
>>>> <tag>-module</tag>
>>>> <value>generic</value>
>>>> SCALAR
>>>> <rank>1</rank>
>>>> </argument>
>>>> <argument>
>>>> <tag>-prefix</tag>
>>>> SCALAR
>>>> <value>INPUT</value>
>>>> <rank>2</rank>
>>>> </argument>
>>>> <argument>
>>>> <tag>-format</tag>
>>>> SCALAR
>>>> <value>gff</value>
>>>> <rank>3</rank>
>>>> </argument>
>>>> <argument>
>>>> <tag>-file_suffix</tag>
>>>> SCALAR
>>>> <value>gff</value>
>>>> <rank>4</rank>
>>>> </argument>
>>>> </method>
>>>> <method>
>>>> <name>dump</name>
>>>> <rank>2</rank>
>>>> <argument>
>>>> <value>OUTPUT</value>
>>>> ARRAY
>>>> <rank>1</rank>
>>>> </argument>
>>>> </method>
>>>> </iohandler>
>>>> </iohandler_setup>
>>>>
>>>><!-- ANALYSIS -->
>>>> <analysis>
>>>> <data_monger>
>>>> <initial></initial>
>>>> <input>
>>>> <name>protein_ids</name>
>>>> <iohandler>1</iohandler>
>>>> </input>
>>>> <input_create>
>>>> <module>setup_initial</module>
>>>> <rank>1</rank>
>>>> <argument>
>>>> <tag>protein_ids</tag>
>>>> <value>2</value>
>>>> </argument>
>>>> </input_create>
>>>></data_monger>
>>>><input_iohandler></input_iohandler>
>>>> </analysis>
>>>>
>>>><!-- BLAST-->
>>>> <analysis>
>>>> <logic_name>Blast</logic_name>
>>>> <runnable>Bio::Pipeline::Runnable::Blast</runnable>
>>>> <db>family</db>
>>>> <db_file>$blastdb1</db_file>
>>>> <program>blastall</program>
>>>>
>>>><!-- BLASTPATH-->
>>>> <program_file>$blastpath</program_file>
>>>> <analysis_parameters>$blast_param1</analysis_parameters>
>>>> <runnable_parameters>-formatdb 1 -result_dir
>>>>$resultdir1</runnable_parameters>
>>>>
>>>> <input_iohandler></input_iohandler>
>>>>
>>>> <output_iohandler></output_iohandler>
>>>> </analysis>
>>>>
>>>><!-- RULES -->
>>>><rule>
>>>> <current_analysis_id>1</current_analysis_id>
>>>> <next_analysis_id>2</next_analysis_id>
>>>> NOTHING
>>>>
>>>></rule>
>>>>
>>>></pipeline_flow_setup>
>>>><job_setup>
>>>></job_setup>
>>>>
>>>></pipeline_setup>
>>>>
>>>>
>>>>And I obtain:
>>>>
>>>>Creating biopipe
>>>> Loading Schema...
>>>>Reading Data_setup xml : /home/conte/xml/newhope.xml
>>>>Doing DBAdaptor and IOHandler setup
>>>>Doing Pipeline Flow Setup
>>>>Doing Analysis..
>>>>
>>>>------------- EXCEPTION -------------
>>>>MSG: Need to store analysis first
>>>>STACK Bio::Pipeline::SQL::JobAdaptor::store
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/SQL/
>>>>JobAdaptor.pm:459
>>>>STACK Bio::Pipeline::XMLImporter::_create_initial_input_and_job
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/
>>>>XMLImporter.pm:837
>>>>STACK Bio::Pipeline::XMLImporter::run
>>>>/usr/local/ActivePerl-5.8/lib/site_perl/5.8.0/Bio/Pipeline/
>>>>XMLImporter.pm:484
>>>>STACK toplevel PipelineManager:120
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Matthieu CONTE
>>>>M. Sc. in Bioinformatics form SIB
>>>>CIRAD
>>>>00 33 06.68.90.28.70
>>>>m_conte at hotmail.com
>>>>
>>>>_________________________________________________________________
>>>>MSN Search, le moteur de recherche qui pense comme vous !
>>>>http://search.msn.fr/worldwide.asp
>>>>
>>>>_______________________________________________
>>>>bioperl-pipeline mailing list
>>>>bioperl-pipeline at bioperl.org
>>>>http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>
>>_________________________________________________________________
>>MSN Messenger : discutez en direct avec vos amis !
>>http://www.msn.fr/msger/default.asp
>>
>>_______________________________________________
>>bioperl-pipeline mailing list
>>bioperl-pipeline at bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-pipeline
>>
>
_________________________________________________________________
MSN Search, le moteur de recherche qui pense comme vous !
http://search.msn.fr/worldwide.asp
More information about the bioperl-pipeline
mailing list