[Bioperl-l] exonerate parser in bioperl-live fails when protein2dna comparison is performed
Tania Oh
tania.oh at brasenose.oxford.ac.uk
Wed Aug 15 16:05:15 UTC 2007
Dear All,
I was trying to use the Bio::SearchIO::Alignment::Exonerate module to
run and parse my exonerate output. But I've noticed that the parser
which is actually Bio::SearchIO::Exonerate works if the model used in
Exonerate is --model est2genome. I used exonerate with the model --
model protein2dna and the parser was unable to parse the hsps.
Below is a simple of code I used for testing the output from exonerate:
use Bio::SearchIO;
use strict;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exonerate.output.works
Type: application/octet-stream
Size: 6056 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0008.obj>
-------------- next part --------------
my $searchio = Bio::SearchIO->new(-file => 'test_data/
exonerate.output.dontwork
-------------- next part --------------
A non-text attachment was scrubbed...
Name: exonerate.output.dontwork
Type: application/octet-stream
Size: 3283 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070815/e4e43d75/attachment-0009.obj>
-------------- next part --------------
',
-format => 'exonerate');
while( my $r = $searchio->next_result ) {
while(my $hit = $r->next_hit){
while(my $hsp = $hit->next_hsp){
print $hsp->start. "\t". $hsp->end. "\n";
}
}
print $r->query_name, "\n";
}
There are 2 files attached to show the examples of using either the
est2genome or protein2dna model:
1. exonerate.output.works - produced from the command line:
exonerate -q exonerate_cdna.fa -t exonerate_genomic.fa --model
est2genome --bestn 1 > exonerate.output.works
2. exonerate.output.dontwork - produced from the command line:
exonerate -q test_aa.fa -t test_cds.fa --model protein2dna >
exonerate.output.dontwork
Line 239 in Bio::searchIO::exonerate (cut and pasted below)
elsif( s/^vulgar:\s+(\S+)\s+ # query sequence id
(\d+)\s+(\d+)\s+([\-\+])\s+ # query start-end-strand
(\S+)\s+ # target sequence id
(\d+)\s+(\d+)\s+([\-\+])\s+ # target start-end-
strand
(\d+)\s+ # score
//ox ) {
parses the vulgar line of an --model est2genome exonerate output
well. An example of the (complex) vulgar line which I've truncated
for readability is:
vulgar: MUSSPSYN 3 1279 + 4.143962167-143965267 28 3074 + 6137 M 8 8
G 0 1 M 231 231 5 0 2 I 0 253 3 0
whereas the vulgar line I've obtained from a --model protein2dna
exonerate output is much simpler and the parser fails to pick it up:
vulgar: SJCHGC00851 0 204 . SJCHGC00851 2 614 + 1059 M 204 612
Has anyone encountered this situation before? I've not changed the
parser as exonerate is widely used for it's est2genome model, and
thought I'd run it pass the list to see if there is a work around
solution.
many thanks in advance,
tania
More information about the Bioperl-l
mailing list