[Bioperl-l] exonerate parser in bioperl-live fails when protein2dna comparison is performed

Wed Aug 15 16:05:15 UTC 2007

Dear All,

I was trying to use the Bio::SearchIO::Alignment::Exonerate module to  
run and parse my exonerate output. But I've noticed that the parser  
which is actually Bio::SearchIO::Exonerate works if the model used in  
Exonerate is --model est2genome. I used exonerate with the model -- 
model protein2dna and the parser was unable to parse the hsps.

Below is a simple of code I used for testing the output from exonerate:

use Bio::SearchIO;
use strict;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exonerate.output.works
Type: application/octet-stream
Size: 6056 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0008.obj>
-------------- next part --------------
my $searchio = Bio::SearchIO->new(-file => 'test_data/ 
exonerate.output.dontwork
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exonerate.output.dontwork
Type: application/octet-stream
Size: 3283 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0009.obj>
-------------- next part --------------
',
                                    -format => 'exonerate');

   while( my $r = $searchio->next_result ) {
           while(my $hit = $r->next_hit){
                   while(my $hsp = $hit->next_hsp){
                           print $hsp->start. "\t". $hsp->end. "\n";
                   }
           }

     print $r->query_name, "\n";
   }

There are 2 files attached to show the examples of using either the  
est2genome or protein2dna model:
1. exonerate.output.works  - produced from the command line:
exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model  
est2genome --bestn 1 > exonerate.output.works

2. exonerate.output.dontwork - produced from the command line:
exonerate -q test_aa.fa -t test_cds.fa --model protein2dna >  
exonerate.output.dontwork

Line 239 in Bio::searchIO::exonerate (cut and pasted below)

elsif(  s/^vulgar:\s+(\S+)\s+         # query sequence id
                  (\d+)\s+(\d+)\s+([\-\+])\s+   # query start-end-strand
                  (\S+)\s+                      # target sequence id
                  (\d+)\s+(\d+)\s+([\-\+])\s+   # target start-end- 
strand
                  (\d+)\s+                      # score
                  //ox ) {

parses the vulgar line of an --model est2genome exonerate output  
well. An example of the (complex) vulgar line which I've truncated  
for readability is:
vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8 8  
G 0 1 M 231 231 5 0 2 I 0 253 3 0

whereas the vulgar line I've obtained from a --model protein2dna  
exonerate output is much simpler and the parser fails to pick it up:
vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612

Has anyone encountered this situation before? I've not changed the  
parser as exonerate is widely used for it's est2genome model, and  
thought I'd run it pass the list to see if there is a work around  
solution.

many thanks in advance,
tania