[BioRuby] blast -m7 (xml) and multiple queries

Ben Woodcroft donttrustben at gmail.com
Sat Jun 28 04:26:14 UTC 2008


Hi,

I seem to have run across a bug in the bioruby blast report parser, in
that it isn't able to handle reports that span multiple query
sequences. My code for parsing is

Bio::Blast.reports(ARGF) do |report|
  puts "Hits for " + report.query_def + " against " + report.db
  report.each {|hit|
  hit.each do |hsp|
    puts [
      report.query_def,
      hit.accession,
      hsp.query_from,
      hsp.query_to,
      hsp.hit_from,
      hsp.hit_to,
      hsp.evalue,
      hit.target_def
    ].join("\t")
  end
}

When I run this on a blast xml output with 2 queries (1st has 10 hits
and 2nd has 7), I get 8 hits shown, which is somewhat confusing. The
query sequences are somewhat similar, so they have some hits in common
- perhaps this sort of explains the number 8.

I'm using bioruby from git
http://github.com/bioruby/bioruby/commit/a61b16163d3ca74f3f7c8d8e8f03f5f8c68dee60

Using the newest blast (2.2.18).

Is this easy to fix? Is there a workaround?



A partial answer:
According to http://rubyforge.org//tracker/index.php?func=detail&aid=20272&group_id=769&atid=3037
this is an unopened, unfixed bug, caused by a change in the NCBI XML
schema. I can workaround by reblasting with the legacy flag -V.



Thanks in advance,
ben



More information about the BioRuby mailing list