[Bioperl-l] exonerate parser in bioperl-live fails when protein2dna comparison is performed

Wed Aug 15 21:16:27 UTC 2007

I can confirm this with bioperl-live.  Bio::SearchIO::exonerate docs  
indicate protein2genome and est2genome model output is supported but  
doesn't specifically indicate that it can parse any other output.   
You can add an enhancement request to bugzilla indicating this  
deficiency or, if you are inclined, add the functionality yourself  
and donate the code.

chris

On Aug 15, 2007, at 11:05 AM, Tania Oh wrote:

> Dear All,
>
> I was trying to use the Bio::SearchIO::Alignment::Exonerate module  
> to run and parse my exonerate output. But I've noticed that the  
> parser which is actually Bio::SearchIO::Exonerate works if the  
> model used in Exonerate is --model est2genome. I used exonerate  
> with the model --model protein2dna and the parser was unable to  
> parse the hsps.
>
>
> Below is a simple of code I used for testing the output from  
> exonerate:
>
> use Bio::SearchIO;
> use strict;
> <exonerate.output.works>
> my $searchio = Bio::SearchIO->new(-file => 'test_data/ 
> exonerate.output.dontwork
> <exonerate.output.dontwork>
> ',
>                                    -format => 'exonerate');
>
>   while( my $r = $searchio->next_result ) {
>           while(my $hit = $r->next_hit){
>                   while(my $hsp = $hit->next_hsp){
>                           print $hsp->start. "\t". $hsp->end. "\n";
>                   }
>           }
>
>     print $r->query_name, "\n";
>   }
>
>
> There are 2 files attached to show the examples of using either the  
> est2genome or protein2dna model:
> 1. exonerate.output.works  - produced from the command line:
> exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model  
> est2genome --bestn 1 > exonerate.output.works
>
> 2. exonerate.output.dontwork - produced from the command line:
> exonerate -q test_aa.fa -t test_cds.fa --model protein2dna >  
> exonerate.output.dontwork
>
>
> Line 239 in Bio::searchIO::exonerate (cut and pasted below)
>
> elsif(  s/^vulgar:\s+(\S+)\s+         # query sequence id
>                  (\d+)\s+(\d+)\s+([\-\+])\s+   # query start-end- 
> strand
>                  (\S+)\s+                      # target sequence id
>                  (\d+)\s+(\d+)\s+([\-\+])\s+   # target start-end- 
> strand
>                  (\d+)\s+                      # score
>                  //ox ) {
>
> parses the vulgar line of an --model est2genome exonerate output  
> well. An example of the (complex) vulgar line which I've truncated  
> for readability is:
> vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8  
> 8 G 0 1 M 231 231 5 0 2 I 0 253 3 0
>
> whereas the vulgar line I've obtained from a --model protein2dna  
> exonerate output is much simpler and the parser fails to pick it up:
> vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612
>
> Has anyone encountered this situation before? I've not changed the  
> parser as exonerate is widely used for it's est2genome model, and  
> thought I'd run it pass the list to see if there is a work around  
> solution.
>
> many thanks in advance,
> tania
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign