[BioRuby] Bio::Blat::Report

Wed Sep 3 15:48:07 UTC 2008

Hi again sorry for all this e-mails,

I notice a change in the reporter object (add_line method) after commit:
http://github.com/bioruby/bioruby/commit/ 
88b2fb24dddcd2d5d0715e8274eda1b1ebac0abd

+      # Adds a line to the entry if the given line is regarded as
+      # a part of the current entry.
+      # If the current entry (self) is empty, or the line has the same
+      # query name, the line is added and returns self.
+      # Otherwise, returns false (the line is not added).
+      def add_line(line)
+        if /\A\s*\z/ =~ line then
+          return @hits.empty? ? self : false
+        end
+        hit = Hit.new(line.chomp)
+        if @hits.empty? or @hits.first.query.name == hit.query.name  
then
+          @hits.push hit
+          return self
+        else
+          return false
+        end
        end

So now if there are more than one query_id in the input file it will  
be automatically splitted in different reports right?

That's cool (I have developed a method in my blat analyzer to group  
hits by id that I can remove now).

the only point I see: what append with an input with line swapped?
I don't believe is a common case anyway: blat psl results are ordered  
by query name
but can happend if you change the order of psl lines.

consider this script:

#!/usr/local/bin/ruby -w
require 'bio'

Bio::FlatFile.open(Bio::Blat::Report,ARGF).each do |report|
  puts "object id: " + report.object_id.to_s  + " hits: " +  
report.hits.size.to_s + " query name:" + report.query_id
end

Before the commit it give only one object, and (as reported in doc)  
only the first query name.

now with this test file:

-------------- next part --------------

3 lines of psl output with 3 different query name:

output:

object id: 277400 hits: 1 query name:query1
object id: 274620 hits: 1 query name:query2
object id: 271910 hits: 1 query name:query3

But if with a psl file like this one:

-------------- next part --------------

Where we have 3 query names (2 hits each) and lines are not in order:

object id: 277400 hits: 1 query name:query1
object id: 274620 hits: 1 query name:query2
object id: 272010 hits: 1 query name:query1
object id: 269350 hits: 1 query name:query3
object id: 266640 hits: 1 query name:query2
object id: 263930 hits: 1 query name:query3

f I sort the lines again by query name:

-------------- next part --------------

object id: 277400 hits: 2 query name:query1
object id: 273590 hits: 2 query name:query2
object id: 269800 hits: 2 query name:query3

So it doesn't work if you have unsorted lines (but I guess is faster).

Sorry for my bad english and for this long mail.

best regards

Davide Rambaldi,
Bioinformatics PhD student.
-----------------------------------------------------
Bioinformatic Group IFOM-IEO Campus
Via Adamello 16, Milano
I-20139 Italy

[t] +39 02574303 066
[e] davide.rambaldi at ifom-ieo-campus.it
[i] http://ciccarelli.group.ifom-ieo-campus.it/fcwiki/DavideRambaldi  
(homepage)
[i] http://www.semm.it             (PhD school)
[i] http://www.btbs.unimib.it/     (Master)

-----------------------------------------------------