[Bioperl-l] Problems with Bio::SearchIO
Dan Bolser
dan.bolser at gmail.com
Fri Nov 7 14:27:20 UTC 2008
Hi,
I'm new to BioPerl, so apologies if I'm doing something wrong. Please
just let me know what I'm doing wrong or what you need.
I downloaded the latest BioPerl via SVN (revision 14980), and did the,
perl Build.PL
./Build test
./Build install
(note that I have lib::local installed, pointing things at ~/perl5)
I ignored the failed tests, as-per the "everyone who is l88t does
this" instructions ;-)
I am trying to parse the output of a "blastall -p blastn" job
(blastall 2.2.18) using either the default format or tabular format. I
copied the "search_overview.PLS" script found here:
http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/graphics/search_overview.PLS
My code looks like this:
my $parser =
new Bio::SearchIO(
-format => 'blasttable',
-file => 'my.blast.m8.out' );
warn "parsed (", $parser->result_count(), ")\n";
But what I see (using either default or tabular format) is:
./visualizeBlastHits.plx
Use of uninitialized value in warn at ./visualizeBlastHits.plx line
34, <DATA> line 192.
parsed ()
Looking closer I found that $parser->result_count() only gets set
after calling $parser->next_result. Any way to force this? In some
Perl objects I've seen a 'parse' method that kicks the object into
(silently) calling all its get methods. Is there an equivalent (but
apparently undocumented) method? Actually, I think it should kick
itself when called... or not? Certainly the docs do not suggest that
is won't return a the number of results ("Function: Gets the number of
Blast results that have been parsed.") So I think this is a bug.
Actually I just checked against version 1.4 and "result_count()" fails
with the default, the XML or the tabular format results (all three
representing the same query against the same database with the same
blast version). Note that trying with the XML format in the latest
version of BioPerl results in the outright failure:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::SearchIO::result_count" is not implemented
by package Bio::SearchIO::blastxml.
This is not your fault - author of Bio::SearchIO::blastxml should be blamed!
STACK: Error::throw
STACK: Bio::Root::Root::throw
/homes/dbolser/perl5/lib/perl5/Bio/Root/Root.pm:357
STACK: Bio::Root::RootI::throw_not_implemented
/homes/dbolser/perl5/lib/perl5/Bio/Root/RootI.pm:680
STACK: Bio::SearchIO::result_count
/homes/dbolser/perl5/lib/perl5/Bio/SearchIO.pm:410
STACK: ./visualizeBlastHits.plx:35
----------------------------------------------------------------
While parsing and comparing the different formats (and versions) I
noticed a couple of other problems that I could not find reported
anywhere, so I'll report them here.
The appropriate parts of the code are:
while( my $r = $parser->next_result ) {
print $r->query_length, "\n";
while(my $h = $r->next_hit ) {
print $h->length, "\n";
my $p = $h->hsp('best');
print $h->significance, "\t", $h->score, "\t", $h->bits, "\n";
}
}
It seems that both the "$r->query_length" and the "$h->length" are
missing when using the tabular format blast results (using either
BioPerl version). I get confused here because there is also some
undocumented behavior used in the above script that means a 'hit'
(apparently) returns the start and end point of the two most extreme
HSPs (and the length of that range).
Secondly, I found in all but the default format (using either BioPerl
version), the "$p->score" is missing (set to an empty string). It
shows up fine using the default format. The significance and the
bit-score show up fine... or at least they show up... The values look
wrong now I come to check. e.g. the score is equal to the bit-score
when using the default format with version 1.4, and is often smaller
than the bit-score using the latest version (or is $h->bits not the
bit score?)
The closest hits in the mailing list that I could find to these probemes were:
http://lists.open-bio.org/pipermail/bioperl-l/2002-May/007936.html
http://lists.open-bio.org/pipermail/bioperl-l/2002-September/009586.html
but I don't think that they are relevant.
Since it comes up here, how is the 'best' HSP defined? it isn't
documented as far as I can tell.
About the documentation... looking here:
http://search.cpan.org/~birney/bioperl-1.4/Bio/SearchIO.pm
Several of the structured methods 'blocks' are followed by a "See
Bio::..." link to other pages in CPAN. However the 'next_result'
method is followed by a link to
http://search.cpan.org/~birney/bioperl-1.4/Bio/Root/RootI.pm - I think
it should be a link to
http://search.cpan.org/~birney/bioperl-1.4/Bio/Search/Result/ResultI.pm
Also, it would be nice (especially for noobs) if the full list of
accepted format codes were given on that page. The current text "#
format can be 'fasta', 'blast', 'exonerate', ..." is extremely
frustrating for a beginner "... what?!". I now realize that each
format code is matched by a "Bio::SearchIO::formatcode" module, but I
didn't know that from reading the above.
While I'm at it, on page
http://search.cpan.org/~birney/bioperl-1.4/Bio/Search/Hit/HitI.pm -
the phrase "Equivalent to raw_score()" appearing under the heading
"score" is a broken link. In fact every "See also : $this->method()"
type link on that page is broken (there are about 25). Also the link
to "See also : BUGS" is broken.
> User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
Thank you for your participation! I hope the above can help in some
way, and I hope its not down to me making trivial mistakes! If these
look like genuine bugs, should I report them on RT?
Out of interest, I did get some fail while testing, specifically (or
perhaps coincidentally) some related to SearchIO...
./Build test verbose=1 > test.results.dump &> test.results.dump
...
ok 83 - HSP num_conserved not implemented
ok 84 - HSP num_identical not implemented
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 152.
# got: '77'
# expected: '67'
ok 85 - HSP percent_identity not implemented
ok 86 - HSP cigar_string not implemented
...
ok 114 - HSP rank
not ok 115 - HSP seq_inds
ok 116 - HSP significance
ok 117 - HSP end
...
ok 147 - HSP frame
ok 148 - HSP gaps
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 218.
# got: '79'
# expected: '69'
ok 149 - The object isa Bio::Align::AlignI
ok 150 - HSP hit isa Bio::SeqFeature::Similarity
...
ok 161 - HSP rank
not ok 162 - HSP seq_inds
ok 163 - HSP significance
ok 164 - HSP end
...
ok 170 - HSP meta
ok 171 - HSP strand
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 285.
# got: '76'
# expected: '64'
ok 172 - HSP meta gap bug
ok 173 - HSP meta
...
ok 250 - HSP rank
not ok 251 - HSP seq_inds
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 475.
# got: '76'
# expected: '64'
ok 252 - HSP significance
ok 253 - HSP end
...
ok 288 - HSP range
ok 289 - HSP rank
not ok 290 - HSP seq_inds
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 541.
# got: '55'
# expected: '31'
ok 291 - HSP significance
ok 292 - HSP end
...
ok 336 - HSP rank
not ok 337 - HSP seq_inds
# Failed test 'HSP seq_inds'
# at t/RNA_SearchIO.t line 608.
# got: '79'
# expected: '69'
ok 338 - HSP significance
ok 339 - HSP end
...
ok 492 - HSP custom_score
ok 493
ok 494
ok 495 - HSP strand
# Looks like you failed 6 tests of 495.
Dubious, test returned 6 (wstat 1536, 0x600)
Failed 6/495 subtests
t/RandDistFunctions.............
1..5
ok 1 - use Bio::Tools::RandomDistFunctions;
ok 2
ok 3
...
t/SearchDist....................
1..0 # Skip The optional module Bio::Ext::Align (or dependencies
thereof) was not installed
skipped: The optional module Bio::Ext::Align (or dependencies thereof)
was not installed
t/SearchIO......................
1..1812
ok 1 - use Bio::SearchIO;
ok 2 - use Bio::SearchIO::Writer::HitTableWriter;
ok 3 - use Bio::SearchIO::Writer::HTMLResultWriter;
ok 4 - The object isa Bio::Search::Result::ResultI
ok 5 - database_name()
ok 6 - query_name()
ok 7
ok 8
ok 9
ok 10
...
ok 564
ok 565
ok 566
not ok 567
# Failed test at t/SearchIO.t line 775.
# got: '0.5918'
# expected: '0.5955'
not ok 568
# Failed test at t/SearchIO.t line 776.
# got: '0.6100'
# expected: '0.6139'
not ok 569
# Failed test at t/SearchIO.t line 777.
# got: '0.5940'
# expected: '0.5977'
ok 570
ok 571
not ok 572
# Failed test at t/SearchIO.t line 780.
# got: '158'
# expected: '159'
ok 573
ok 574
ok 575
...
ok 737
not ok 738
# Failed test at t/SearchIO.t line 996.
# got: '51.59'
# expected: '51.67'
not ok 739
# Failed test at t/SearchIO.t line 997.
# got: '0.5235'
# expected: '0.5244'
not ok 740
# Failed test at t/SearchIO.t line 998.
# got: '0.5718'
# expected: '0.5728'
ok 741
ok 742
not ok 743
# Failed test at t/SearchIO.t line 1001.
# got: '677'
# expected: '678'
ok 744
ok 745
ok 746
--------------------- WARNING ---------------------
MSG: In sequence 5X_1895.fa residue count gives end value 5611.
Overriding value [5623] with value 5611 for Bio::LocatableSeq::end().
...
etc.
Test Summary Report
-------------------
t/RNA_SearchIO (Wstat: 1536 Tests: 495 Failed: 6)
Failed tests: 76, 115, 162, 251, 290, 337
Non-zero exit status: 6
t/SearchIO (Wstat: 2048 Tests: 1812 Failed: 8)
Failed tests: 567-569, 572, 738-740, 743
Non-zero exit status: 8
t/singlet (Wstat: 512 Tests: 4 Failed: 2)
Failed tests: 3-4
Non-zero exit status: 2
Files=259, Tests=14572, 131 wallclock secs ( 3.47 usr 0.90 sys +
94.53 cusr 8.24 csys = 107.14 CPU)
Result: FAIL
Failed 3/259 test programs. 16/14572 subtests failed.
Dan.
P.S. I've also been attacking the wiki, so please undo any mess that I
may have made there.
--
http://network.nature.com/profile/dan
More information about the Bioperl-l
mailing list