[BioRuby] Bio::Blat::Report

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Thu Sep 4 03:52:56 UTC 2008


On Wed, 3 Sep 2008 17:48:07 +0200
Davide Rambaldi <davide.rambaldi at ifom-ieo-campus.it> wrote:

> Hi again sorry for all this e-mails,
> 
> I notice a change in the reporter object (add_line method) after commit:
> http://github.com/bioruby/bioruby/commit/ 
> 88b2fb24dddcd2d5d0715e8274eda1b1ebac0abd
> 
> +      # Adds a line to the entry if the given line is regarded as
> +      # a part of the current entry.
> +      # If the current entry (self) is empty, or the line has the same
> +      # query name, the line is added and returns self.
> +      # Otherwise, returns false (the line is not added).
> +      def add_line(line)
> +        if /\A\s*\z/ =~ line then
> +          return @hits.empty? ? self : false
> +        end
> +        hit = Hit.new(line.chomp)
> +        if @hits.empty? or @hits.first.query.name == hit.query.name  
> then
> +          @hits.push hit
> +          return self
> +        else
> +          return false
> +        end
>         end
> 
> 
> So now if there are more than one query_id in the input file it will  
> be automatically splitted in different reports right?

Yes, in combination with Bio::FlatFile.

The behavior was changed after this commit:
http://github.com/bioruby/bioruby/commit/88b2fb24dddcd2d5d0715e8274eda1b1ebac0abd

This is somehow incompatible, but good at speed and memory usage.
In addition, some people requested.
http://lists.open-bio.org/pipermail/bioruby-ja/2007-August/000137.html
(Mailing list written in Japanese)

Note that this can make mistake for data contiguously containing
different query sequences with the same name.

> That's cool (I have developed a method in my blat analyzer to group  
> hits by id that I can remove now).
> 
> the only point I see: what append with an input with line swapped?
> I don't believe is a common case anyway: blat psl results are ordered  
> by query name
> but can happend if you change the order of psl lines.

When the parser detects change of query entry name,
the report object will be changed to new one.

Note that the Bio::Blat::Report parser only supports files
directly generated by the blat program, without post-modification.
What happened with modified data is your own risk.

> consider this script:
> 
> #!/usr/local/bin/ruby -w
> require 'bio'
> 
> Bio::FlatFile.open(Bio::Blat::Report,ARGF).each do |report|
>   puts "object id: " + report.object_id.to_s  + " hits: " +  
> report.hits.size.to_s + " query name:" + report.query_id
> end
> 
> Before the commit it give only one object, and (as reported in doc)  
> only the first query name.
> 
> now with this test file:

If you really want old bahavior,

  str = File.read(filename)
  obj = Bio::Blat::Report.new(str)

the obj is a single Bio::Blat::Report object with
possible multiple queries.

-- 
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org



More information about the BioRuby mailing list