From k at dev.open-bio.org Fri Feb 1 22:36:00 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Sat, 02 Feb 2008 03:36:00 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.80,1.81 Message-ID: <200802020336.m123a0gr029664@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv29660 Modified Files: ChangeLog Log Message: * lib/bio/shell/rails/vendor/plugins/ The 'generators' directory is moved under the 'bioruby' subdirectory so that 'bioruby --rails' command can work with Rails 2.x series in addition to the Rails 1.2.x series. Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.80 retrieving revision 1.81 diff -C2 -d -r1.80 -r1.81 *** ChangeLog 9 Jan 2008 17:22:39 -0000 1.80 --- ChangeLog 2 Feb 2008 03:35:58 -0000 1.81 *************** *** 1,4 **** --- 1,12 ---- 2008-01-10 Toshiaki Katayama + * lib/bio/shell/rails/vendor/plugins/ + + The 'generators' directory is moved under the 'bioruby' subdirectory + so that 'bioruby --rails' command can work with Rails 2.x series + in addition to the Rails 1.2.x series. + + 2008-01-10 Toshiaki Katayama + * lib/bio/io/hinv.rb From k at dev.open-bio.org Fri Feb 1 22:54:50 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Sat, 02 Feb 2008 03:54:50 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.81,1.82 Message-ID: <200802020354.m123soGS029686@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv29682 Modified Files: ChangeLog Log Message: fix typo Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.81 retrieving revision 1.82 diff -C2 -d -r1.81 -r1.82 *** ChangeLog 2 Feb 2008 03:35:58 -0000 1.81 --- ChangeLog 2 Feb 2008 03:54:48 -0000 1.82 *************** *** 1,3 **** ! 2008-01-10 Toshiaki Katayama * lib/bio/shell/rails/vendor/plugins/ --- 1,3 ---- ! 2008-02-02 Toshiaki Katayama * lib/bio/shell/rails/vendor/plugins/ From ngoto at dev.open-bio.org Mon Feb 11 21:13:34 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Tue, 12 Feb 2008 02:13:34 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/appl/blast format0.rb,1.25,1.26 Message-ID: <200802120213.m1C2DX5m009903@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/appl/blast In directory dev.open-bio.org:/tmp/cvs-serv9861/lib/bio/appl/blast Modified Files: format0.rb Log Message: * Bug fixes reported by Shuji Shigenobu. * Failed to parse query length for long query (>= 10000 letters) because comma is inserted for digit separator by blastall. * Failed to parse e-value for some BLASTX results Index: format0.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/appl/blast/format0.rb,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** format0.rb 27 Dec 2007 17:28:57 -0000 1.25 --- format0.rb 12 Feb 2008 02:13:31 -0000 1.26 *************** *** 265,270 **** q << sc.scan(/.*/) sc.skip(/\s*^ ?/) ! end until !sc.rest or r = sc.skip(/ *\( *(\d+) *letters *\)\s*\z/) ! @query_len = sc[1].to_i if r @query_def = q.join(' ') end --- 265,270 ---- q << sc.scan(/.*/) sc.skip(/\s*^ ?/) ! end until !sc.rest or r = sc.skip(/ *\( *([\,\d]+) *letters *\)\s*\z/) ! @query_len = sc[1].delete(',').to_i if r @query_def = q.join(' ') end *************** *** 969,973 **** while sc.rest? sc.skip(/\s*/) ! if sc.skip(/Expect(?:\(\d\))? *\= *([e\-\.\d]+)/) then ev = sc[1].to_s ev = '1' + ev if ev[0] == ?e --- 969,973 ---- while sc.rest? sc.skip(/\s*/) ! if sc.skip(/Expect(?:\(\d+\))? *\= *([e\-\.\d]+)/) then ev = sc[1].to_s ev = '1' + ev if ev[0] == ?e From ngoto at dev.open-bio.org Tue Feb 12 00:32:25 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Tue, 12 Feb 2008 05:32:25 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.82,1.83 Message-ID: <200802120532.m1C5WP5M011183@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv11163 Modified Files: ChangeLog Log Message: ChangeLog for lib/bio/appl/blast/format0.rb from 1.25 to 1.26. Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.82 retrieving revision 1.83 diff -C2 -d -r1.82 -r1.83 *** ChangeLog 2 Feb 2008 03:54:48 -0000 1.82 --- ChangeLog 12 Feb 2008 05:32:23 -0000 1.83 *************** *** 1,2 **** --- 1,11 ---- + 2008-02-12 Naohisa Goto + + * lib/bio/appl/blast/format0.rb + + Fixed bugs: Failed to parse query length for long query + (>= 10000 letters) as comma is inserted for digit separator + by blastall; Failed to parse e-value for some BLASTX results. + Thanks to Shuji Shigenobu who reported the bugs and sent patches. + 2008-02-02 Toshiaki Katayama From ngoto at dev.open-bio.org Wed Feb 13 05:28:20 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Wed, 13 Feb 2008 10:28:20 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58,0.58.2.1 Message-ID: <200802131028.m1DASKHe017196@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv17175/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: Added a new class method Bio::Sequence.read(str). Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58 retrieving revision 0.58.2.1 diff -C2 -d -r0.58 -r0.58.2.1 *** sequence.rb 5 Apr 2007 23:35:39 -0000 0.58 --- sequence.rb 13 Feb 2008 10:28:16 -0000 0.58.2.1 *************** *** 334,337 **** --- 334,356 ---- @moltype = AA end + + # Create a new Bio::Sequence object from a formatted string + # (GenBank, EMBL, fasta format, etc.) + # + # s = Bio::Sequence.read(str) + # --- + # *Arguments*: + # * (required) _str_: string + # * (optional) _format_: format specification (class or nil) + # *Returns*:: Bio::Sequence object + def self.read(str, format = nil) + if format then + klass = format + else + klass = Bio::FlatFile::AutoDetect.default.guess(str) + end + obj = klass.new(str) + obj.to_biosequence + end end # Sequence From ngoto at dev.open-bio.org Wed Feb 13 22:13:48 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:13:48 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4,1.4.2.1 Message-ID: <200802140313.m1E3Dm2s019722@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv19681/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: * lib/bio/sequence.rb * changed to include Format module * lib/bio/sequence/format.rb * fixed bug: incorrect refactoring Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4 retrieving revision 1.4.2.1 diff -C2 -d -r1.4 -r1.4.2.1 *** format.rb 5 Apr 2007 23:35:41 -0000 1.4 --- format.rb 14 Feb 2008 03:13:46 -0000 1.4.2.1 *************** *** 18,23 **** module Bio - autoload :Sequence, 'bio/sequence' - class Sequence --- 18,21 ---- *************** *** 127,131 **** def format_qualifiers(qualifiers, indent, width) ! qualifiers.each do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s --- 125,129 ---- def format_qualifiers(qualifiers, indent, width) ! qualifiers.collect do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s *************** *** 134,138 **** lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + val, width) else if v[/\D/] --- 132,136 ---- lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + v, width) else if v[/\D/] *************** *** 141,149 **** v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + val, width) end ! return lines.gsub(/^/, indent) ! end end --- 139,148 ---- v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end ! lines.gsub!(/^/, indent) ! lines ! end.join end From ngoto at dev.open-bio.org Wed Feb 13 22:13:48 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:13:48 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.1,0.58.2.2 Message-ID: <200802140313.m1E3DmsN019717@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv19681/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: * lib/bio/sequence.rb * changed to include Format module * lib/bio/sequence/format.rb * fixed bug: incorrect refactoring Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.1 retrieving revision 0.58.2.2 diff -C2 -d -r0.58.2.1 -r0.58.2.2 *** sequence.rb 13 Feb 2008 10:28:16 -0000 0.58.2.1 --- sequence.rb 14 Feb 2008 03:13:46 -0000 0.58.2.2 *************** *** 71,74 **** --- 71,75 ---- autoload :Generic, 'bio/sequence/generic' autoload :Format, 'bio/sequence/format' + include Format # Create a new Bio::Sequence object *************** *** 149,153 **** # *Returns*:: String object def output(style) - extend Bio::Sequence::Format case style when :fasta --- 150,153 ---- From ngoto at dev.open-bio.org Wed Feb 13 22:32:16 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:32:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.1,1.4.2.2 Message-ID: <200802140332.m1E3WGAu019905@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv19885/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: in wrap(), the last "\n" should be added for non-empty string Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.1 retrieving revision 1.4.2.2 diff -C2 -d -r1.4.2.1 -r1.4.2.2 *** format.rb 14 Feb 2008 03:13:46 -0000 1.4.2.1 --- format.rb 14 Feb 2008 03:32:14 -0000 1.4.2.2 *************** *** 170,174 **** end result << left if left ! return result.join("\n") end --- 170,176 ---- end result << left if left ! result_string = result.join("\n") ! result_string << "\n" unless result_string.empty? ! return result_string end From ngoto at dev.open-bio.org Thu Feb 14 03:51:47 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 08:51:47 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/genbank genbank.rb,0.40,0.40.2.1 Message-ID: <200802140851.m1E8plsw023607@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/genbank In directory dev.open-bio.org:/tmp/cvs-serv23587/lib/bio/db/genbank Modified Files: Tag: BRANCH-biohackathon2008 genbank.rb Log Message: added new method Bio::GenBank#to_biosequence. Index: genbank.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/genbank/genbank.rb,v retrieving revision 0.40 retrieving revision 0.40.2.1 diff -C2 -d -r0.40 -r0.40.2.1 *** genbank.rb 5 Apr 2007 23:35:40 -0000 0.40 --- genbank.rb 14 Feb 2008 08:51:45 -0000 0.40.2.1 *************** *** 126,129 **** --- 126,157 ---- end + # converts Bio::GenBank to Bio::Sequence + # --- + # *Arguments*: + # *Returns*:: Bio::Sequence object + def to_biosequence + sequence = Bio::Sequence.new(seq) + sequence.entry_id = self.entry_id + + sequence.primary_accession = self.accession + sequence.secondary_accessions = self.accessions - [ self.accession ] + + sequence.molecule_type = self.natype + sequence.division = self.division + sequence.topology = self.circular + + sequence.sequence_version = self.version + seq.date_created = nil #???? + sequence.date_modified = self.date + + sequence.keywords = self.keywords + sequence.species = self.organism + sequence.classification = self.taxonomy + sequence.organnella = nil # not used + sequence.comments = self.comment + sequence.references = self.references + return sequence + end + end # GenBank end # Bio From ngoto at dev.open-bio.org Thu Feb 14 21:18:24 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 02:18:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.2,1.4.2.3 Message-ID: <200802150218.m1F2IOnH025723@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv25703/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: special character in regexp should be escaped Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.2 retrieving revision 1.4.2.3 diff -C2 -d -r1.4.2.2 -r1.4.2.3 *** format.rb 14 Feb 2008 03:32:14 -0000 1.4.2.2 --- format.rb 15 Feb 2008 02:18:21 -0000 1.4.2.3 *************** *** 157,161 **** line = nil width.downto(1) do |i| ! if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') --- 157,161 ---- line = nil width.downto(1) do |i| ! if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') From ngoto at dev.open-bio.org Thu Feb 14 22:23:25 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 03:23:25 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.2,0.58.2.3 Message-ID: <200802150323.m1F3NP8b025922@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv25902/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: In Bio::Sequence#method_missing, __send__ should be used instead of send. When method is not found, error message is modified if it is caused by method_missing. Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.2 retrieving revision 0.58.2.3 diff -C2 -d -r0.58.2.2 -r0.58.2.3 *** sequence.rb 14 Feb 2008 03:13:46 -0000 0.58.2.2 --- sequence.rb 15 Feb 2008 03:23:23 -0000 0.58.2.3 *************** *** 97,101 **** # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: ! @seq.send(sym, *args, &block) end --- 97,119 ---- # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: ! begin ! @seq.__send__(sym, *args, &block) ! rescue NoMethodError => evar ! lineno = __LINE__ - 2 ! file = __FILE__ ! bt_here = [ "#{file}:#{lineno}:in \`__send__\'", ! "#{file}:#{lineno}:in \`method_missing\'" ! ] ! if bt_here == evar.backtrace[0, 2] then ! bt = evar.backtrace[2..-1] ! evar = NoMethodError.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") ! evar.set_backtrace(bt) ! end ! #p lineno ! #p file ! #p bt_here ! #p evar.backtrace ! raise(evar) ! end end From ngoto at dev.open-bio.org Thu Feb 14 22:33:53 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 03:33:53 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.3,0.58.2.4 Message-ID: <200802150333.m1F3Xr5w025971@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv25951/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: changed to use original exception class instead of NoMethodError Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.3 retrieving revision 0.58.2.4 diff -C2 -d -r0.58.2.3 -r0.58.2.4 *** sequence.rb 15 Feb 2008 03:23:23 -0000 0.58.2.3 --- sequence.rb 15 Feb 2008 03:33:51 -0000 0.58.2.4 *************** *** 107,111 **** if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] ! evar = NoMethodError.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end --- 107,111 ---- if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] ! evar = evar.class.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end From aerts at dev.open-bio.org Thu Feb 14 23:49:39 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Fri, 15 Feb 2008 04:49:39 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl embl.rb,1.29,1.29.2.1 Message-ID: <200802150449.m1F4ndLY026633@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv26608/db/embl Modified Files: Tag: BRANCH-biohackathon2008 embl.rb Log Message: Added functionality to convert a Bio::EMBL object into a full-blown Bio::Sequence object that contains features, references and other additional information. Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.29 retrieving revision 1.29.2.1 diff -C2 -d -r1.29 -r1.29.2.1 *** embl.rb 5 Apr 2007 23:35:40 -0000 1.29 --- embl.rb 15 Feb 2008 04:49:37 -0000 1.29.2.1 *************** *** 3,7 **** # # ! # Copyright:: Copyright (C) 2001-2007 Mitsuteru C. Nakao # License:: The Ruby License # --- 3,9 ---- # # ! # Copyright:: Copyright (C) 2001-2007 ! # Mitsuteru C. Nakao ! # Jan Aerts # License:: The Ruby License # *************** *** 121,124 **** --- 123,130 ---- alias molecule_type molecule + def topology + id_line('TOPOLOGY') + end + # returns DIVISION in the ID line. # * Bio::EMBL#division -> String *************** *** 222,227 **** # # Bio::EMBLDB#ref ! ! ## # DR Line; defabases cross-regerence (>=0) --- 228,233 ---- # # Bio::EMBLDB#ref ! ! ## # DR Line; defabases cross-regerence (>=0) *************** *** 356,366 **** # bb Line; (blanks) sequence data (>=1) def seq ! Sequence::NA.new( fetch('').gsub(/ /,'').gsub(/\d+/,'') ) end alias naseq seq alias ntseq seq ! # // Line; termination line (end; 1/entry) ### private methods --- 362,392 ---- # bb Line; (blanks) sequence data (>=1) def seq ! Bio::Sequence::NA.new( fetch('').gsub(/ /,'').gsub(/\d+/,'') ) end alias naseq seq alias ntseq seq ! # // Line; termination line (end; 1/entry) + def to_biosequence + bio_seq = Bio::Sequence.new(self.seq) + bio_seq.entry_id = self.entry_id + bio_seq.primary_accession = self.accessions[0] + bio_seq.secondary_accessions = self.accessions[1,-1] + bio_seq.molecule_type = self.molecule_type + bio_seq.definition = self.description + bio_seq.topology = self.topology + bio_seq.date_created = self.dt['created'] + bio_seq.date_modified = self.dt['updated'] + bio_seq.division = self.division + bio_seq.sequence_version = self.version + bio_seq.keywords = self.keywords + bio_seq.species = self.os(0)[0]['os'] + ' ' + self.os(0)[0]['name'] + bio_seq.classification = self.oc + bio_seq.references = self.references + bio_seq.features = self.ft + + return bio_seq + end ### private methods *************** *** 401,402 **** --- 427,443 ---- end # module Bio + + if __FILE__ == $0 + require '../../../bio' + require 'yaml' + + prefix = 'FT ' + indent = prefix + ' ' * 16 + fwidth = 80 - indent.length + + parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') + parser.each do |entry| + # entry.ref + puts entry.to_biosequence.output(:embl) + end + end \ No newline at end of file From aerts at dev.open-bio.org Thu Feb 14 23:49:39 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Fri, 15 Feb 2008 04:49:39 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.4,0.58.2.5 Message-ID: <200802150449.m1F4ndul026630@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv26608 Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: Added functionality to convert a Bio::EMBL object into a full-blown Bio::Sequence object that contains features, references and other additional information. Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.4 retrieving revision 0.58.2.5 diff -C2 -d -r0.58.2.4 -r0.58.2.5 *** sequence.rb 15 Feb 2008 03:33:51 -0000 0.58.2.4 --- sequence.rb 15 Feb 2008 04:49:37 -0000 0.58.2.5 *************** *** 13,16 **** --- 13,17 ---- # + require 'erb' require 'bio/sequence/compat' *************** *** 73,76 **** --- 74,79 ---- include Format + attr_accessor :sequence_version, :topology, :molecule_type, :data_class, :division, :primary_accession, :secondary_accessions, :date_created, :date_modified, :species, :classification + # Create a new Bio::Sequence object # *************** *** 165,181 **** # --- # *Arguments*: ! # * (required) _style_: :fasta, :genbank, *or* :embl # *Returns*:: String object ! def output(style) ! case style ! when :fasta ! format_fasta ! when :gff ! format_gff ! when :genbank ! format_genbank ! when :embl ! format_embl ! end end --- 168,176 ---- # --- # *Arguments*: ! # * (required) _format_: :fasta, :genbank, *or* :embl # *Returns*:: String object ! def output(format = :fasta) ! record_template = ERB.new(File.read(File.dirname(__FILE__) + "/db/#{format.to_s}/format.erb")) ! record_template.result(binding) end *************** *** 372,375 **** --- 367,375 ---- end + + def accessions + return [@primary_accession, @secondary_accessions].flatten + end + end # Sequence *************** *** 380,510 **** if __FILE__ == $0 ! puts "== Test Bio::Sequence::NA.new" ! p Bio::Sequence::NA.new('') ! p na = Bio::Sequence::NA.new('atgcatgcATGCATGCAAAA') ! p rna = Bio::Sequence::NA.new('augcaugcaugcaugcaaaa') ! ! puts "\n== Test Bio::Sequence::AA.new" ! p Bio::Sequence::AA.new('') ! p aa = Bio::Sequence::AA.new('ACDEFGHIKLMNPQRSTVWYU') ! ! puts "\n== Test Bio::Sequence#to_s" ! p na.to_s ! p aa.to_s ! ! puts "\n== Test Bio::Sequence#subseq(2,6)" ! p na ! p na.subseq(2,6) ! ! puts "\n== Test Bio::Sequence#[2,6]" ! p na ! p na[2,6] ! ! puts "\n== Test Bio::Sequence#to_fasta('hoge', 8)" ! puts na.to_fasta('hoge', 8) ! ! puts "\n== Test Bio::Sequence#window_search(15)" ! p na ! na.window_search(15) {|x| p x} ! ! puts "\n== Test Bio::Sequence#total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4})" ! p na.total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4}) ! ! puts "\n== Test Bio::Sequence#composition" ! p na ! p na.composition ! p rna ! p rna.composition ! ! puts "\n== Test Bio::Sequence::NA#splicing('complement(join(1..5,16..20))')" ! p na ! p na.splicing("complement(join(1..5,16..20))") ! p rna ! p rna.splicing("complement(join(1..5,16..20))") ! ! puts "\n== Test Bio::Sequence::NA#complement" ! p na.complement ! p rna.complement ! p Bio::Sequence::NA.new('tacgyrkmhdbvswn').complement ! p Bio::Sequence::NA.new('uacgyrkmhdbvswn').complement ! ! puts "\n== Test Bio::Sequence::NA#translate" ! p na ! p na.translate ! p rna ! p rna.translate ! ! puts "\n== Test Bio::Sequence::NA#gc_percent" ! p na.gc_percent ! p rna.gc_percent ! ! puts "\n== Test Bio::Sequence::NA#illegal_bases" ! p na.illegal_bases ! p Bio::Sequence::NA.new('tacgyrkmhdbvswn').illegal_bases ! p Bio::Sequence::NA.new('abcdefghijklmnopqrstuvwxyz-!%#$@').illegal_bases ! ! puts "\n== Test Bio::Sequence::NA#molecular_weight" ! p na ! p na.molecular_weight ! p rna ! p rna.molecular_weight ! ! puts "\n== Test Bio::Sequence::NA#to_re" ! p Bio::Sequence::NA.new('atgcrymkdhvbswn') ! p Bio::Sequence::NA.new('atgcrymkdhvbswn').to_re ! p Bio::Sequence::NA.new('augcrymkdhvbswn') ! p Bio::Sequence::NA.new('augcrymkdhvbswn').to_re ! ! puts "\n== Test Bio::Sequence::NA#names" ! p na.names ! ! puts "\n== Test Bio::Sequence::NA#pikachu" ! p na.pikachu ! ! puts "\n== Test Bio::Sequence::NA#randomize" ! print "Orig : "; p na ! print "Rand : "; p na.randomize ! print "Rand : "; p na.randomize ! print "Rand : "; p na.randomize.randomize ! print "Block : "; na.randomize do |x| print x end; puts ! ! print "Orig : "; p rna ! print "Rand : "; p rna.randomize ! print "Rand : "; p rna.randomize ! print "Rand : "; p rna.randomize.randomize ! print "Block : "; rna.randomize do |x| print x end; puts ! ! puts "\n== Test Bio::Sequence::NA.randomize(counts)" ! print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'t'=>40} ! print "Rand : "; p Bio::Sequence::NA.randomize(counts) ! print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'u'=>40} ! print "Rand : "; p Bio::Sequence::NA.randomize(counts) ! print "Block : "; Bio::Sequence::NA.randomize(counts) {|x| print x}; puts ! ! puts "\n== Test Bio::Sequence::AA#codes" ! p aa ! p aa.codes ! ! puts "\n== Test Bio::Sequence::AA#names" ! p aa ! p aa.names ! ! puts "\n== Test Bio::Sequence::AA#molecular_weight" ! p aa.subseq(1,20) ! p aa.subseq(1,20).molecular_weight ! ! puts "\n== Test Bio::Sequence::AA#randomize" ! aaseq = 'MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA' ! s = Bio::Sequence::AA.new(aaseq) ! print "Orig : "; p s ! print "Rand : "; p s.randomize ! print "Rand : "; p s.randomize ! print "Rand : "; p s.randomize.randomize ! print "Block : "; s.randomize {|x| print x}; puts ! puts "\n== Test Bio::Sequence::AA.randomize(counts)" ! print "Count : "; p counts = s.composition ! print "Rand : "; puts Bio::Sequence::AA.randomize(counts) ! print "Block : "; Bio::Sequence::AA.randomize(counts) {|x| print x}; puts end --- 380,404 ---- if __FILE__ == $0 ! require 'bio' ! seq = Bio::Sequence.new('aattaaaacgccacgcaaggcgattctaggaaatcaaaacgacacgaaatgtggggtgggtgtttgggtaggaaagacagttgtcaacatcagggatttggattgaatcaaaaaaaaagtccttagatttcataaaagctaatcacgcctcaaaactggggcctatctcttcttttttgtcgcttcctgtcggtccttctctatttcttctccaacccctcatttttgaatatttacataacaaaccgttttactttctttggtcaaaattagacccaaaattctatattagtttaagatatgtggtctgtaatttattgttgtattgatataaaaattagttataagcgattatatttttatgctcaagtaactggtgttagttaactatattccaccacgataacctgattacataaaatatgattttaatcattttagtaaaccatatcgcacgttggatgattaattttaacggtttaataacacgtgattaaattatttttagaatgattatttacaaacggaaaagctatatgtgacacaataactcgtgcagtattgttagtttgaaaagtgtatttggtttcttatatttggcctcgattttcagtttatgtgctttttacaaagttttattttcgttatctgtttaacgcgacatttgttgtatggctttaccgatttgagaataaaatcatattacctttatgtagccatgtgtggtgtaatatataataatggtccttctacgaaaaaagcagatcacaattgaaataaagggtgaaatttggtgtcccttttcttcgtcgaaataacagaactaaataaaagaaagtgttatagtatattacgtccgaagaataatccatattcctgaaatacagtcaacatattatatatttagtactttatataaagttaggaattaaatcatatgttttatcgaccatattaagt! cacaactttatcataaattaatctgtaattagaattccaagttcgccaccgaatttcgtaacctaatctacatataatagataaaatatatatatgtagagtaattatgatatctatgtatgtagtcatggtatatgaattttgaaattggcaaggtaacattgacggatcgtaacccaacaaataatattaattacaaaatgggtgggcgggaatagtatacaactcataattccactcactttttgtattattaggatatgaaataagagtaatcaacatgcataataaagatgtataatttcttcatcttaaaaaacataactacatggtttaatacacaattttaccttttatcaaaaaagtatttcacaattcactcgcaaattacgaaatgatggctagtgcttcaactccaaatttcgaatattttaaatcacgatgtgtagaaccttttatttactggatactaatcactagtttattgagccaaccaattagttaaatagaacaatcaatattatagccagatattttttcctttaaaaatatttaaaagaggggccagaaaagaaccagagagggaggccatgagacattattatcactagtcaaaaacaacaaaccctccttttgctttttcatataaattattatattttattttgcaggtttcttctcttcttcttcttcttcttcttcttcttcctcttggctgctttctttcatcatccataaagtgaaagctaacgcatagagagagccatatcgtcccaaaaaaagcaaaagtccaaaaaaaaacaactccaaaacattctctcttagctctttactctttagtttctctctctctctctgcctttctctttgttgaagttcatggatgctacgaagtggactcaggtacgtaaaaagatatctctctgctatatctgtttgtttgtagcttctccccgactctcacgctctctctctctctctctctctc! tttgtgtatctctctactcacataaatatatacatgtgtgtgtatgcatgtttatatgtatgtatgaaac cagtagtggttatacagatagtctatatagagatatcaatatgatgtgttttaatttagactttttatatatccgtttgaaacttccgaagttctcgaatggagttaaggaagttttgttctctacaagttcaatttttcttgtcattaattataaaactctgataactaatggataaaaaaggtatgctttgttagttaccttttgttcttggtgctcaggtcttaccatttttttcctaaattttaattagtctcctttctttaattaattttatgttaacgcactgacgatttaacgttaacaaaaaaacctagattctttttcttttcaatagagcataattattacttcaatttcatttatctcacactaaaccctaatcttggcgaaattccttttatatatataaatttaattaatttttccacaatcttggcggaattcaggactcggttttgcttgttattgttctctcttttaatttgacatggttagggaatacttaaagtatgtcttaattttatagggttttcaagaaatgataaacgtaaagccaatggagcaaatgatttctagcaccaacaacaacacaccgcaacaacaaccaacattcatcgccaccaacacaaggccaaacgccaccgcatccaatggtggctccggaggaaataccaacaacacggctacgatggaaactagaaaggcgaggccacaagagaaagtaaattgtccaagatgcaactcaacaaacacaaagttctgttattacaacaactacagtctcacgcaaccaagatacttctgcaaaggttgtcgaaggtattggaccgaaggtggctctcttcgtaacgtcccagtcggaggtagctcaagaaagaacaagagatcctctacacctttagcttcaccttctaatcccaaacttccagatctaaacccaccgattcttttctcaagccaaatccctaataagtcaaataaagatc! tcaacttgctatctttcccggtcatgcaagatcatcatcatcatggtatgtctcatttttttcatatgcccaagatagagaacaacaatacttcatcctcaatctatgcttcatcatctcctgtctcagctcttgagcttctaagatccaatggagtctcttcaagaggcatgaacacgttcttgcctggtcaaatgatggattcaaactcagtcctgtactcatctttagggtttccaacaatgcctgattacaaacagagtaataacaacctttcattctccattgatcatcatcaagggattggacataacaccatcaacagtaaccaaagagctcaagataacaatgatgacatgaatggagcaagtagggttttgttccctttttcagacatgaaagagctttcaagcacaacccaagagaagagtcatggtaataatacatattggaatgggatgttcagtaatacaggaggatcttcatggtgaaaaaaggttaaaaagagctcatgaactatcagctttcttctctttttctgtttttttctcctattttattatagtttttactttgatgatcttttgttttttctcacatggggaactttacttaaagttgtcagaacttagtttacagattgtctttttattccttctttctggttttccttttttcctttttttatcagtctttttaaaatatgtatttcataattgggtttgatcattcatatttattagtatcaaaatagagtctatgttcatgagggagtgttaaggggtgtgagggtagaagaataagtgaatacgggggcccg') ! seq.entry_id = 'AJ224122' ! seq.sequence_version = 3 ! seq.topology = 'linear' ! seq.molecule_type = 'genomic DNA' ! seq.data_class = 'STD' ! seq.division = 'PLN' ! seq.primary_accession = 'AJ224122' ! seq.secondary_accessions = [] ! seq.date_created = '27-FEB-1998 (Rel. 54, Created)' ! seq.date_modified = '14-NOV-2006 (Rel. 89, Last updated, Version 6)' ! seq.definition = 'Arabidopsis thaliana DAG1 gene' ! seq.keywords = ['BBFa gene', 'transcription factor'] ! seq.species = 'Arabidopsis thaliana (thale cress)' ! seq.classification = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', ! 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core eudicotyledons', 'rosids', ! 'eurosids II', 'Brassicales', 'Brassicaceae', 'Arabidopsis'] ! # puts seq.output(:embl) ! puts seq.output(:fasta) end From ngoto at dev.open-bio.org Fri Feb 15 00:29:52 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 05:29:52 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.5,0.58.2.6 Message-ID: <200802150529.m1F5Tqn1026874@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv26854/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: bugfix in Bio::Sequence.read: mistaken method name Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.5 retrieving revision 0.58.2.6 diff -C2 -d -r0.58.2.5 -r0.58.2.6 *** sequence.rb 15 Feb 2008 04:49:37 -0000 0.58.2.5 --- sequence.rb 15 Feb 2008 05:29:50 -0000 0.58.2.6 *************** *** 361,365 **** klass = format else ! klass = Bio::FlatFile::AutoDetect.default.guess(str) end obj = klass.new(str) --- 361,365 ---- klass = format else ! klass = Bio::FlatFile::AutoDetect.default.autodetect(str) end obj = klass.new(str) From aerts at dev.open-bio.org Mon Feb 18 10:43:29 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Mon, 18 Feb 2008 15:43:29 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/fasta - New directory Message-ID: <200802181543.m1IFhTLc011233@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/fasta In directory dev.open-bio.org:/tmp/cvs-serv11213/fasta Log Message: Directory /home/repository/bioruby/bioruby/lib/bio/db/fasta added to the repository --> Using per-directory sticky tag `BRANCH-biohackathon2008' From aerts at dev.open-bio.org Mon Feb 18 10:44:41 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Mon, 18 Feb 2008 15:44:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24,1.24.2.1 Message-ID: <200802181544.m1IFifJc011281@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv11261 Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: Added export method to EMBL format. Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24 retrieving revision 1.24.2.1 diff -C2 -d -r1.24 -r1.24.2.1 *** reference.rb 5 Apr 2007 23:35:39 -0000 1.24 --- reference.rb 18 Feb 2008 15:44:39 -0000 1.24.2.1 *************** *** 2,8 **** # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001, 2006 # Toshiaki Katayama , ! # Ryan Raaum # License:: The Ruby License # --- 2,9 ---- # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001, 2006, 2008 # Toshiaki Katayama , ! # Ryan Raaum , ! # Jan Aerts # License:: The Ruby License # *************** *** 79,82 **** --- 80,89 ---- # Affiliations in an Array. attr_reader :affiliations + + # Sequence number in EMBL/GenBank records + attr_reader :embl_gb_record_number + + # Position in a sequence that this reference refers to + attr_reader :sequence_position # Create a new Bio::Reference object from a Hash of values. *************** *** 130,133 **** --- 137,144 ---- @url = hash['url'] @mesh = hash['mesh'] + @embl_gb_record_number = hash['embl_gb_record_number'] || nil + @sequence_position = hash['sequence_position'] || [] + @comments = hash['comments'] || [] + @xrefs = hash['xrefs'] || [] @affiliations = hash['affiliations'] @authors = [] if @authors.empty? *************** *** 171,174 **** --- 182,187 ---- def format(style = nil, option = nil) case style + when 'embl' + return embl when 'endnote' return endnote *************** *** 246,249 **** --- 259,298 ---- end + # Returns reference formatted in the EMBL style. + # + # # ref is a Bio::Reference object + # puts ref.embl + # + # RP 1-1859 + # RX PUBMED; 1907511. + # RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; + # RT "Nucleotide and derived amino acid sequence of the cyanogenic + # RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)"; + # RL Plant Mol. Biol. 17(2):209-219(1991). + def embl + lines = Array.new + if ! @embl_gb_record_number.nil? + lines << "RN [#{@embl_gb_record_number}]" + end + if @comments != [] + @comments.each do |c| + lines << "RC #{c}" + end + end + if @sequence_position != '' + lines << "RP #{@sequence_position}" + end + if ! @xrefs.nil? + @xrefs.each do |x| + lines << "RX #{x}" + end + end + lines << @authors.join(', ').wrap(80, 'RA ') + ';' unless @authors.nil? + lines << (@title == '' ? 'RT ;' : ('"' + @title + '"').wrap(80, 'RT ') + ';') + lines << @journal.wrap(80, 'RL ') unless @journal == '' + lines << "XX" + return lines.join("\n") + end + # Returns reference formatted in the bibitem style # *************** *** 542,546 **** # class References ! # Array of Bio::Reference objects attr_accessor :references --- 591,596 ---- # class References ! include Enumerable ! # Array of Bio::Reference objects attr_accessor :references From k at dev.open-bio.org Mon Feb 18 22:36:54 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 03:36:54 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io ncbirest.rb, NONE, 1.1 pubmed.rb, 1.23, 1.24 Message-ID: <200802190336.m1J3as4O012327@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12321 Modified Files: pubmed.rb Added Files: ncbirest.rb Log Message: * NCBI E-Utilities (REST) functionality is separated to ncbirest.rb and pubmed.rb is changed to utilize the Bio::NCBI::REST class for esearch and efetch. You can now search and retrieve any database in any format that NCBI supports by E-Utilities through the Bio::NCBI::REST interface (currently, only esearch and efetch methods are implemented). Index: pubmed.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/pubmed.rb,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** pubmed.rb 12 Dec 2007 13:53:26 -0000 1.23 --- pubmed.rb 19 Feb 2008 03:36:52 -0000 1.24 *************** *** 2,6 **** # = bio/io/pubmed.rb - NCBI Entrez/PubMed client module # ! # Copyright:: Copyright (C) 2001, 2007 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License --- 2,6 ---- # = bio/io/pubmed.rb - NCBI Entrez/PubMed client module # ! # Copyright:: Copyright (C) 2001, 2007, 2008 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License *************** *** 9,12 **** --- 9,13 ---- # + require 'bio/io/ncbirest' require 'bio/command' require 'cgi' unless defined?(CGI) *************** *** 69,95 **** # medline = Bio::MEDLINE.new(manuscript) # ! class PubMed ! ! # Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time ! # weekdays for any series of more than 100 requests. ! # -> Not implemented yet in BioRuby ! ! # Make no more than one request every 3 seconds. ! NCBI_INTERVAL = 3 ! @@last_access = nil ! ! private ! ! def ncbi_access_wait(wait = NCBI_INTERVAL) ! if @@last_access ! duration = Time.now - @@last_access ! if wait > duration ! sleep wait - duration ! end ! end ! @@last_access = Time.now ! end ! ! public # Search the PubMed database by given keywords using E-Utils and returns --- 70,74 ---- # medline = Bio::MEDLINE.new(manuscript) # ! class PubMed < Bio::NCBI::REST # Search the PubMed database by given keywords using E-Utils and returns *************** *** 100,136 **** # --- # *Arguments*: ! # * _id_: query string (required) ! # * _field_ ! # * _reldate_ ! # * _mindate_ ! # * _maxdate_ ! # * _datetype_ ! # * _retstart_ ! # * _retmax_ (default 100) ! # * _retmode_ ! # * _rettype_ # *Returns*:: array of PubMed IDs or a number of results def esearch(str, hash = {}) ! return nil if str.empty? ! ! serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" ! opts = { ! "retmax" => 100, ! "tool" => "bioruby", ! "db" => "pubmed", ! "term" => str ! } opts.update(hash) ! ! ncbi_access_wait ! ! response, = Bio::Command.post_form(serv, opts) ! result = response.body ! if opts['rettype'] == 'count' ! result = result.scan(/(.*?)<\/Count>/m).flatten.first.to_i ! else ! result = result.scan(/(.*?)<\/Id>/m).flatten ! end ! return result end --- 79,98 ---- # --- # *Arguments*: ! # * _str_: query string (required) ! # * _hash_: hash of E-Utils options ! # * _retmode_: "xml", "html", ... ! # * _rettype_: "medline", ... ! # * _retmax_: integer (default 100) ! # * _retstart_: integer ! # * _field_ ! # * _reldate_ ! # * _mindate_ ! # * _maxdate_ ! # * _datetype_ # *Returns*:: array of PubMed IDs or a number of results def esearch(str, hash = {}) ! opts = { "db" => "pubmed" } opts.update(hash) ! super(str, opts) end *************** *** 142,168 **** # *Arguments*: # * _ids_: list of PubMed IDs (required) # *Returns*:: Array of MEDLINE formatted String def efetch(ids, hash = {}) ! return nil if ids.to_s.empty? ! ids = ids.join(",") if ids === Array ! ! serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" ! opts = { ! "tool" => "bioruby", ! "db" => "pubmed", ! "retmode" => "text", ! "rettype" => "medline", ! "id" => ids, ! } opts.update(hash) ! ! ncbi_access_wait ! ! response, = Bio::Command.post_form(serv, opts) ! result = response.body ! if opts["retmode"] == "text" ! result = result.split(/\n\n+/) ! end ! return result end --- 104,122 ---- # *Arguments*: # * _ids_: list of PubMed IDs (required) + # * _hash_: hash of E-Utils options + # * _retmode_: "xml", "html", ... + # * _rettype_: "medline", ... + # * _retmax_: integer (default 100) + # * _retstart_: integer + # * _field_ + # * _reldate_ + # * _mindate_ + # * _maxdate_ + # * _datetype_ # *Returns*:: Array of MEDLINE formatted String def efetch(ids, hash = {}) ! opts = { "db" => "pubmed", "rettype" => "medline" } opts.update(hash) ! super(ids, opts) end --- NEW FILE: ncbirest.rb --- # # = bio/io/ncbrest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # # $Id: ncbirest.rb,v 1.1 2008/02/19 03:36:52 k Exp $ # require 'bio/command' module Bio # == Description # # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # # * Entrez utilities index: # http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html # * How to link: # http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp # # == Usage # # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) # Bio::NCBI::REST.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"}) # class NCBI class REST # Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time # weekdays for any series of more than 100 requests. # -> Not implemented yet in BioRuby # Make no more than one request every 3 seconds. NCBI_INTERVAL = 3 @@last_access = nil private def ncbi_access_wait(wait = NCBI_INTERVAL) if @@last_access duration = Time.now - @@last_access if wait > duration sleep wait - duration end end @@last_access = Time.now end public # Search the NCBI database by given keywords using E-Utils and returns # an array of entry IDs. # # For information on the possible arguments, see # # * http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html # # --- # *Arguments*: # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "nuccore", "pubmed", ... # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) # * _retstart_: integer # * _field_ # * _reldate_ # * _mindate_ # * _maxdate_ # * _datetype_ # *Returns*:: array of entry IDs or a number of results def esearch(str, hash = {}) return nil if str.empty? serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = { "retmax" => 100, "tool" => "bioruby", "term" => str } opts.update(hash) ncbi_access_wait response, = Bio::Command.post_form(serv, opts) result = response.body if opts['rettype'] == 'count' result = result.scan(/(.*?)<\/Count>/m).flatten.first.to_i else result = result.scan(/(.*?)<\/Id>/m).flatten end return result end # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "nuccore", "pubmed", ... # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) # * _retstart_: integer # * _field_ # * _reldate_ # * _mindate_ # * _maxdate_ # * _datetype_ # *Returns*:: Array of entry String def efetch(ids, hash = {}) return nil if ids.to_s.empty? ids = ids.join(",") if ids === Array serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" opts = { "tool" => "bioruby", "retmode" => "text", "id" => ids, } opts.update(hash) ncbi_access_wait response, = Bio::Command.post_form(serv, opts) result = response.body if opts["retmode"] == "text" result = result.split(/\n\n+/) end return result end def self.esearch(*args) self.new.esearch(*args) end def self.efetch(*args) self.new.efetch(*args) end end # REST end # NCBI end # Bio if __FILE__ == $0 gbopts = {"db"=>"nuccore", "rettype"=>"gb"} pmopts = {"db"=>"pubmed", "rettype"=>"medline"} count = {"rettype" => "count"} xml = {"retmode"=>"xml"} max = {"retmax"=>5} puts "=== class methods ===" puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'tardigrada' in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(count)) puts Time.now puts "# max 5 'tardigrada' entries in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(max)) puts Time.now puts "# count of 'yeast kinase' in nuccore" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(count)) puts Time.now puts "# max 5 'yeast kinase' entries in nuccore (XML)" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(xml).merge(max)) puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed (XML)" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(xml).merge(max)) puts Time.now Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)).each do |x| puts "# each of 5 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# '185041' entry in nuccore" puts Bio::NCBI::REST.efetch("185041", gbopts) puts Time.now puts "# 'J00231' entry in nuccore (XML)" puts Bio::NCBI::REST.efetch("J00231", gbopts.merge(xml)) puts Time.now puts "# 16381885 entry in pubmed" puts Bio::NCBI::REST.efetch(16381885, pmopts) puts Time.now puts "# '16381885' entry in pubmed" puts Bio::NCBI::REST.efetch("16381885", pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed (XML)" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts.merge(xml)) puts "=== instance methods ===" ncbi = Bio::NCBI::REST.new puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)) puts Time.now ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts).each do |x| puts "# each 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# 16381885 entry in pubmed" puts ncbi.efetch(16381885, pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts ncbi.efetch([10592173, 14693808], pmopts) end From k at dev.open-bio.org Mon Feb 18 23:42:16 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 04:42:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io hinv.rb,1.1,1.2 Message-ID: <200802190442.m1J4gGQZ012425@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12421 Modified Files: hinv.rb Log Message: * hit2acc fixed Index: hinv.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/hinv.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** hinv.rb 9 Jan 2008 17:18:18 -0000 1.1 --- hinv.rb 19 Feb 2008 04:42:14 -0000 1.2 *************** *** 2,6 **** # = bio/io/hinv.rb - H-invDB web service (REST) client module # ! # Copyright:: Copyright (C) 2007 Toshiaki Katayama # License:: The Ruby License # --- 2,6 ---- # = bio/io/hinv.rb - H-invDB web service (REST) client module # ! # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # *************** *** 137,141 **** def initialize ! @url = BASE_URI + "hit2acc.php?hit=" end --- 137,141 ---- def initialize ! @url = BASE_URI + "hit2acc.php" end From k at dev.open-bio.org Mon Feb 18 23:49:37 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 04:49:37 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io ncbirest.rb,1.1,1.2 Message-ID: <200802190449.m1J4nb9x012447@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12443 Modified Files: ncbirest.rb Log Message: * doc update Index: ncbirest.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/ncbirest.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** ncbirest.rb 19 Feb 2008 03:36:52 -0000 1.1 --- ncbirest.rb 19 Feb 2008 04:49:35 -0000 1.2 *************** *** 1,4 **** # ! # = bio/io/ncbrest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama --- 1,4 ---- # ! # = bio/io/ncbirest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama *************** *** 16,26 **** # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # ! # * Entrez utilities index: ! # http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html ! # * How to link: ! # http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp # # == Usage # # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) --- 16,26 ---- # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # ! # Entrez utilities index: ! # ! # * http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html # # == Usage # + # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"count"}) # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) *************** *** 64,69 **** # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "pubmed", ... ! # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) --- 64,69 ---- # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "nucleotide", "protein", "pubmed", ... ! # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) *************** *** 100,109 **** # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "pubmed", ... ! # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) --- 100,114 ---- # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. + # + # For information on the possible arguments, see + # + # * http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html + # # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "nucleotide", "protein", "pubmed", ... ! # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.83,1.83.2.1 Message-ID: <200802200956.m1K9uOcm015785@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv15755 Modified Files: Tag: BRANCH-biohackathon2008 ChangeLog Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.83 retrieving revision 1.83.2.1 diff -C2 -d -r1.83 -r1.83.2.1 *** ChangeLog 12 Feb 2008 05:32:23 -0000 1.83 --- ChangeLog 20 Feb 2008 09:56:21 -0000 1.83.2.1 *************** *** 1,2 **** --- 1,27 ---- + 2008-02-20 Jan Aerts + * lib/bio/db/fasta.rb + * lib/bio/db/fasta/format.erb + * test/unit/bio/db/test_fasta.rb + + Renamed #to_seq to #to_biosequence to reflect that same method in + embl.rb, genbank.rb and others. + + 2008-02-20 Jan Aerts + * lib/bio.rb + * lib/bio/db/embl/common.rb + * lib/bio/db/embl/embl.rb + * lib/bio/db/embl/format.erb + * lib/bio/sequence/common.rb + * lib/bio/sequence/format.rb + * test/unit/bio/db/embl/test_embl_to_bioseq.rb + + Fixed some bugs in importing EMBL files and added functionality to + export a Bio::Sequence to EMBL format. + + 2008-02-18 Jan Aerts + * lib/bio/reference.rb + + Added export method to EMBL format. + 2008-02-12 Naohisa Goto From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl format.erb, NONE, 1.1.2.1 common.rb, 1.12, 1.12.2.1 embl.rb, 1.29.2.1, 1.29.2.2 Message-ID: <200802200956.m1K9uO6r015800@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio/db/embl Modified Files: Tag: BRANCH-biohackathon2008 common.rb embl.rb Added Files: Tag: BRANCH-biohackathon2008 format.erb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl --- NEW FILE: format.erb --- ID <%= entry_id %>; SV <%= sequence_version %>; <%= topology %>; <%= molecule_type %>; <%= data_class %>; <%= division %>; <%= seq.length %> BP. XX AC <%= accessions.reject{|a| a.nil?}.join('; ') + ';' %> XX DT <%= date_created %> DT <%= date_modified %> XX DE <%= definition %> XX KW <%= keywords.join('; ') %>. XX OS <%= species %> <%= classification.join('; ').wrap(80, 'OC ') %>. XX <%= references.collect{|ref| ref.format('embl')}.join("\n") %> XX FH Key Location/Qualifiers FH <% prefix = 'FT ' indent = prefix + ' ' * 16 fwidth = 80 - indent.length %><%= format_features(prefix, indent, fwidth) %>XX SQ Sequence <%= seq.length %> BP; <%= seq.composition.collect{|k,v| "#{v} #{k.upcase}"}.join('; ') + '; ' + (seq.gsub(/[ACTGactg]/, '').length.to_s ) + ' other;' %> <%= seq.format_embl %> // Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.29.2.1 retrieving revision 1.29.2.2 diff -C2 -d -r1.29.2.1 -r1.29.2.2 *** embl.rb 15 Feb 2008 04:49:37 -0000 1.29.2.1 --- embl.rb 20 Feb 2008 09:56:22 -0000 1.29.2.2 *************** *** 123,126 **** --- 123,130 ---- alias molecule_type molecule + def data_class + id_line('DATA_CLASS') + end + def topology id_line('TOPOLOGY') *************** *** 254,258 **** unless @data['FT'] @data['FT'] = Array.new - ary = Array.new in_quote = false @orig['FT'].each_line do |line| --- 258,261 ---- *************** *** 262,268 **** body = line[20,60].chomp # feature value (position, /qualifier=) if line =~ /^FT {3}(\S+)/ ! ary.push([ $1, body ]) # [ feature, position, /q="data", ... ] elsif body =~ /^ \// and not in_quote ! ary.last.push(body) # /q="data..., /q=data, /q if body =~ /=" / and body !~ /"$/ --- 265,271 ---- body = line[20,60].chomp # feature value (position, /qualifier=) if line =~ /^FT {3}(\S+)/ ! @data['FT'].push([ $1, body ]) # [ feature, position, /q="data", ... ] elsif body =~ /^ \// and not in_quote ! @data['FT'].last.push(body) # /q="data..., /q=data, /q if body =~ /=" / and body !~ /"$/ *************** *** 271,275 **** else ! ary.last.last << body # ...data..., ...data..." if body =~ /"$/ --- 274,278 ---- else ! @data['FT'].last.last << body # ...data..., ...data..." if body =~ /"$/ *************** *** 279,287 **** end ! ary.map! do |subary| parse_qualifiers(subary) end - @data['FT'] = Features.new(ary) end if block_given? --- 282,289 ---- end ! @data['FT'].map! do |subary| parse_qualifiers(subary) end end if block_given? *************** *** 373,378 **** bio_seq.entry_id = self.entry_id bio_seq.primary_accession = self.accessions[0] ! bio_seq.secondary_accessions = self.accessions[1,-1] bio_seq.molecule_type = self.molecule_type bio_seq.definition = self.description bio_seq.topology = self.topology --- 375,381 ---- bio_seq.entry_id = self.entry_id bio_seq.primary_accession = self.accessions[0] ! bio_seq.secondary_accessions = self.accessions[1,-1] || [] bio_seq.molecule_type = self.molecule_type + bio_seq.data_class = self.data_class bio_seq.definition = self.description bio_seq.topology = self.topology *************** *** 382,386 **** bio_seq.sequence_version = self.version bio_seq.keywords = self.keywords ! bio_seq.species = self.os(0)[0]['os'] + ' ' + self.os(0)[0]['name'] bio_seq.classification = self.oc bio_seq.references = self.references --- 385,389 ---- bio_seq.sequence_version = self.version bio_seq.keywords = self.keywords ! bio_seq.species = self.fetch('OS') bio_seq.classification = self.oc bio_seq.references = self.references *************** *** 435,439 **** indent = prefix + ' ' * 16 fwidth = 80 - indent.length ! parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') parser.each do |entry| --- 438,443 ---- indent = prefix + ' ' * 16 fwidth = 80 - indent.length ! ! # parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/bioruby_biohackathon/bioruby/test/data/embl/AB090716.embl') parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') parser.each do |entry| Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/common.rb,v retrieving revision 1.12 retrieving revision 1.12.2.1 diff -C2 -d -r1.12 -r1.12.2.1 *** common.rb 5 Apr 2007 23:35:40 -0000 1.12 --- common.rb 20 Feb 2008 09:56:22 -0000 1.12.2.1 *************** *** 241,265 **** def ref unless @data['R'] ! ary = Array.new ! get('R').split(/\nRN /).each do |str| ! raw = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', ! 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} ! str = 'RN ' + str unless /^RN / =~ str ! str.split("\n").each do |line| ! if /^(R[NPXARLCTG]) (.+)/ =~ line ! raw[$1] += $2 + ' ' ! else ! raise "Invalid format in R lines, \n[#{line}]\n" end end ! raw.each_value {|v| ! v.strip! ! v.sub!(/^"/,'') ! v.sub!(/;$/,'') ! v.sub!(/"$/,'') ! } ! ary.push(raw) end - @data['R'] = ary end @data['R'] --- 241,305 ---- def ref unless @data['R'] ! @data['R'] = Array.new ! # Get the different references as 'blurbs' (the lines together) ! reference_blurbs = get('R').split(/\nRN /) ! reference_blurbs.each_index do |i| ! reference_blurbs[i] = 'RN ' + reference_blurbs[i] unless reference_blurbs[i] =~ /^RN / ! end ! ! # For each reference, we'll first create a hash that looks like below. ! # Suppose the input is: ! # RA name1, name2, name3 ! # RA name4 ! # RT some part of the title that ! # RT did not fit on one line ! # Then the hash looks like: ! # h = { ! # 'RA' => ["name1, name2, name3", "name4"], ! # 'RT' => ["some part of the title that", "did not fit on one line"] ! # } ! reference_blurbs.each do |rb| ! line_based_data = Hash.new ! rb.split(/\n/).each do |line| ! key, value = line.scan(/^(R[A-Z]) "?(\[?.*[A-Za-z0-9]\]?)/)[0] ! if line_based_data[key].nil? ! line_based_data[key] = Array.new end + line_based_data[key].push(value) end ! ! # Now we have to sanitize the hash: the authors should be kept in an ! # array, the title should be 1 string, ... So the hash should look like: ! # h = { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # } ! line_based_data.keys.each do |key| ! if ['RC', 'RP', 'RT', 'RL'].include?(key) ! line_based_data[key] = line_based_data[key].join(' ') ! elsif ['RA', 'RX'].include?(key) ! sanitized_data = Array.new ! line_based_data[key].each do |v| ! sanitized_data.push(v.split(/\s*,\s*/)) ! end ! line_based_data[key] = sanitized_data.flatten ! elsif key == 'RN' ! line_based_data[key] = line_based_data[key][0].sub(/^\[/,'').sub(/\]$/,'').to_i ! end ! end ! ! # And put it in @data. @data in the end looks like this: ! # data = [ ! # { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # }, ! # { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # } ! # ] ! @data['R'].push(line_based_data) end end @data['R'] *************** *** 270,306 **** def references unless @data['references'] ! ary = self.ref.map {|ent| ! hash = Hash.new('') ! ent.each {|key, value| case key when 'RA' ! hash['authors'] = value.split(/, /) when 'RT' hash['title'] = value when 'RL' ! if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ ! hash['journal'] = $1 ! hash['volume'] = $2 ! hash['issue'] = $3 ! hash['pages'] = $4 ! hash['year'] = $5 ! else ! hash['journal'] = value ! end when 'RX' # PUBMED, MEDLINE ! value.split('.').each {|item| tag, xref = item.split(/; /).map {|i| i.strip } hash[ tag.downcase ] = xref } end ! } ! Reference.new(hash) ! } ! @data['references'] = References.new(ary) end @data['references'] end - # returns contents in the DR line. # * Bio::EMBLDB::Common#dr -> [ * ] --- 310,345 ---- def references unless @data['references'] ! @data['references'] = Array.new ! self.ref.each do |ref| ! hash = Hash.new ! ref.each do |key, value| case key + when 'RN' + hash['embl_gb_record_number'] = value + when 'RC' + hash['comments'] = value + when 'RX' + hash['xrefs'] = value + when 'RP' + hash['sequence_position'] = value when 'RA' ! hash['authors'] = value when 'RT' hash['title'] = value when 'RL' ! hash['journal'] = value when 'RX' # PUBMED, MEDLINE ! value.each {|item| tag, xref = item.split(/; /).map {|i| i.strip } hash[ tag.downcase ] = xref } end ! end ! @data['references'].push(Reference.new(hash)) ! end end @data['references'] end # returns contents in the DR line. # * Bio::EMBLDB::Common#dr -> [ * ] From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.6,0.58.2.7 Message-ID: <200802200956.m1K9uO8C015795@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.6 retrieving revision 0.58.2.7 diff -C2 -d -r0.58.2.6 -r0.58.2.7 *** sequence.rb 15 Feb 2008 05:29:50 -0000 0.58.2.6 --- sequence.rb 20 Feb 2008 09:56:22 -0000 0.58.2.7 *************** *** 371,375 **** return [@primary_accession, @secondary_accessions].flatten end ! end # Sequence --- 371,375 ---- return [@primary_accession, @secondary_accessions].flatten end ! end # Sequence From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89,1.89.2.1 Message-ID: <200802200956.m1K9uOdN015790@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv15755/lib Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89 retrieving revision 1.89.2.1 diff -C2 -d -r1.89 -r1.89.2.1 *** bio.rb 9 Jan 2008 17:18:17 -0000 1.89 --- bio.rb 20 Feb 2008 09:56:22 -0000 1.89.2.1 *************** *** 278,279 **** --- 278,310 ---- end + class String + def fold(width = 80) + self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") + end + + def wrap(width = 80, prefix = '') + actual_width = width - prefix.length + result = [] + left = self.dup + while left and left.length > actual_width + line = nil + actual_width.downto(1) do |i| + if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then + line = left[0..(i-1)].sub(/ +\z/, '') + left = left[i..-1].sub(/\A +/, '') + break + end + end + if line.nil? then + line = left[0..(actual_width-1)] + left = left[actual_width..-1] + end + result << line + end + result << left if left + result_string = result.join("\n#{prefix}") + result_string = prefix + result_string unless result_string.empty? + # result_string << "\n" unless result_string.empty? + return result_string + end + end \ No newline at end of file From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence common.rb, 1.6, 1.6.2.1 format.rb, 1.4.2.3, 1.4.2.4 Message-ID: <200802200956.m1K9uOhl015806@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 common.rb format.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.3 retrieving revision 1.4.2.4 diff -C2 -d -r1.4.2.3 -r1.4.2.4 *** format.rb 15 Feb 2008 02:18:21 -0000 1.4.2.3 --- format.rb 20 Feb 2008 09:56:22 -0000 1.4.2.4 *************** *** 31,106 **** # puts s.output(:embl) module Format - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the FASTA format string of the sequence. - # - # UNFORTUNATLY, the current implementation of Bio::Sequence is incapable of - # using either the header or width arguments. So something needs to be - # changed... - # - # Currently, this method is used in Bio::Sequence#output like so, - # - # s = Bio::Sequence.new('atgc') - # puts s.output(:fasta) #=> "> \natgc\n" - # --- - # *Arguments*: - # * (optional) _header_: String (default nil) - # * (optional) _width_: Fixnum (default nil) - # *Returns*:: String object - def format_fasta(header = nil, width = nil) - header ||= "#{@entry_id} #{@definition}" - - ">#{header}\n" + - if width - @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") - else - @seq.to_s + "\n" - end - end - - # Not yet implemented :) - # Remove the nodoc command after implementation! - # --- - # *Returns*:: String object - def format_gff #:nodoc: - raise NotImplementedError - end - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the Genbank format string of the sequence. - # Used in Bio::Sequence#output. - # --- - # *Returns*:: String object - def format_genbank - prefix = ' ' * 5 - indent = prefix + ' ' * 16 - fwidth = 79 - indent.length - - format_features(prefix, indent, fwidth) - end - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the EMBL format string of the sequence. - # Used in Bio::Sequence#output. - # --- - # *Returns*:: String object - def format_embl - prefix = 'FT ' - indent = prefix + ' ' * 16 - fwidth = 80 - indent.length - - format_features(prefix, indent, fwidth) - end - - private --- 31,34 ---- *************** *** 114,123 **** head = '' ! wrap(position, width).each_line do |line| result << head << line head = indent end ! result << format_qualifiers(feature.qualifiers, width) end return result --- 42,51 ---- head = '' ! (position).wrap(width).each_line do |line| result << head << line head = indent end ! result << format_qualifiers(feature.qualifiers, indent, width) end return result *************** *** 130,136 **** if v == true ! lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + v, width) else if v[/\D/] --- 58,64 ---- if v == true ! lines =('/' + q).wrap(width) elsif q == 'translation' ! lines = ('/' + q + '="' + v + '"').fold(width) else if v[/\D/] *************** *** 139,143 **** v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end --- 67,71 ---- v = '"' + v + '"' end ! lines = ('/' + q + '=' + v).wrap(width) end *************** *** 147,177 **** end - def fold(str, width) - str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") - end - - def wrap(str, width) - result = [] - left = str.dup - while left and left.length > width - line = nil - width.downto(1) do |i| - if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then - line = left[0..(i-1)].sub(/ +\z/, '') - left = left[i..-1].sub(/\A +/, '') - break - end - end - if line.nil? then - line = left[0..(width-1)] - left = left[width..-1] - end - result << line - end - result << left if left - result_string = result.join("\n") - result_string << "\n" unless result_string.empty? - return result_string - end end # Format --- 75,78 ---- Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/common.rb,v retrieving revision 1.6 retrieving revision 1.6.2.1 diff -C2 -d -r1.6 -r1.6.2.1 *** common.rb 27 Dec 2007 17:36:02 -0000 1.6 --- common.rb 20 Feb 2008 09:56:22 -0000 1.6.2.1 *************** *** 38,42 **** # puts dna.randomize module Common ! # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. --- 38,42 ---- # puts dna.randomize module Common ! # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. *************** *** 66,69 **** --- 66,86 ---- self.class.new(self) end + + def format_embl + output_lines = Array.new + counter = 0 + remainder = self.window_search(60,60) do |subseq| + counter += 60 + subseq.gsub!(/(.{10})/, '\1 ') + output_lines.push(' '*5 + subseq + counter.to_s.rjust(9)) + end + counter += remainder.length + remainder = (remainder.to_s + ' '*(60-remainder.length)) + remainder.gsub!(/(.{10})/, '\1 ') + output_lines.push(' '*5 + remainder + counter.to_s.rjust(9)) + return output_lines.join("\n") + end + + # Normalize the current sequence, removing all whitespace and From aerts at dev.open-bio.org Wed Feb 20 04:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/embl test_embl_to_bioseq.rb, NONE, 1.1.2.1 test_embl.rb, 1.5, 1.5.2.1 test_embl_rel89.rb, 1.2, 1.2.2.1 Message-ID: <200802200956.m1K9uOKd015812@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv15755/test/unit/bio/db/embl Modified Files: Tag: BRANCH-biohackathon2008 test_embl.rb test_embl_rel89.rb Added Files: Tag: BRANCH-biohackathon2008 test_embl_to_bioseq.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl --- NEW FILE: test_embl_to_bioseq.rb --- # # test/unit/bio/db/embl/test_embl.rb - Unit test for Bio::EMBL # # Copyright:: Copyright (C) 2005, 2008 # Mitsuteru Nakao # Jan Aerts # License:: The Ruby License # # $Id: test_embl_to_bioseq.rb,v 1.1.2.1 2008/02/20 09:56:22 aerts Exp $ # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio' require 'bio/db/embl/embl' module Bio class TestEMBLToBioSequence < Test::Unit::TestCase def setup bioruby_root = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5)).cleanpath.to_s input = File.open(File.join(bioruby_root, 'test', 'data', 'embl', 'AB090716.embl.rel89')).read embl_object = Bio::EMBL.new(input) embl_object.instance_eval { @data['OS'] = "Haplochromis sp. 'muzu rukwa'" } @bio_seq = embl_object.to_biosequence end def test_entry_id assert_equal('AB090716', @bio_seq.entry_id) end def test_primary_accession assert_equal('AB090716', @bio_seq.primary_accession) end def test_secondary_accessions assert_equal([], @bio_seq.secondary_accessions) end def test_molecule_type assert_equal('genomic DNA', @bio_seq.molecule_type) end def test_definition assert_equal("Haplochromis sp. 'muzu, rukwa' LWS gene for long wavelength-sensitive opsin, partial cds, specimen_voucher:specimen No. HT-9361.", @bio_seq.definition) end def test_topology assert_equal('linear', @bio_seq.topology) end def test_dates assert_equal('25-OCT-2002 (Rel. 73, Created)', @bio_seq.date_created) assert_equal('14-NOV-2006 (Rel. 89, Last updated, Version 3)', @bio_seq.date_modified) end def test_division assert_equal('VRT', @bio_seq.division) end def test_sequence_version assert_equal(1, @bio_seq.sequence_version) end def test_keywords assert_equal([], @bio_seq.keywords) end def test_species assert_equal("Haplochromis sp. 'muzu, rukwa'", @bio_seq.species) end def test_classification assert_equal(['Eukaryota','Metazoa','Chordata','Craniata','Vertebrata','Euteleostomi','Actinopterygii','Neopterygii','Teleostei','Euteleostei','Neoteleostei','Acanthomorpha','Acanthopterygii','Percomorpha','Perciformes','Labroidei','Cichlidae','African cichlids','Pseudocrenilabrinae','Haplochromini','Haplochromis'], @bio_seq.classification) end def test_references assert_equal(2, @bio_seq.references.length) assert_equal(Bio::Reference, @bio_seq.references[0].class) end def test_features assert_equal(3, @bio_seq.features.length) assert_equal(Bio::Feature, @bio_seq.features[0].class) end end # To really test the Bio::EMBL to Bio::Sequence conversion, we need to test if # that Bio::Sequence can be made into a valid Bio::EMBL again. class TestEMBLToBioSequenceRoundTrip < Test::Unit::TestCase def setup bioruby_root = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5)).cleanpath.to_s input = File.open(File.join(bioruby_root, 'test', 'data', 'embl', 'AB090716.embl.rel89')).read embl_object_1 = Bio::EMBL.new(input) embl_object_1.instance_eval { @data['OS'] = "Haplochromis sp. 'muzu rukwa'" } @bio_seq_1 = embl_object_1.to_biosequence embl_object_2 = Bio::EMBL.new(@bio_seq_1.output(:embl)) @bio_seq_2 = embl_object_2.to_biosequence end def test_entry_id assert_equal('AB090716', @bio_seq_2.entry_id) end def test_primary_accession assert_equal('AB090716', @bio_seq_2.primary_accession) end def test_secondary_accessions assert_equal([], @bio_seq_2.secondary_accessions) end def test_molecule_type assert_equal('genomic DNA', @bio_seq_2.molecule_type) end def test_definition assert_equal("Haplochromis sp. 'muzu, rukwa' LWS gene for long wavelength-sensitive opsin, partial cds, specimen_voucher:specimen No. HT-9361.", @bio_seq_2.definition) end def test_topology assert_equal('linear', @bio_seq_2.topology) end def test_dates assert_equal('25-OCT-2002 (Rel. 73, Created)', @bio_seq_2.date_created) assert_equal('14-NOV-2006 (Rel. 89, Last updated, Version 3)', @bio_seq_2.date_modified) end def test_division assert_equal('VRT', @bio_seq_2.division) end def test_sequence_version assert_equal(1, @bio_seq_2.sequence_version) end def test_keywords assert_equal([], @bio_seq_2.keywords) end def test_species assert_equal("Haplochromis sp. 'muzu, rukwa'", @bio_seq_2.species) end def test_classification assert_equal(['Eukaryota','Metazoa','Chordata','Craniata','Vertebrata','Euteleostomi','Actinopterygii','Neopterygii','Teleostei','Euteleostei','Neoteleostei','Acanthomorpha','Acanthopterygii','Percomorpha','Perciformes','Labroidei','Cichlidae','African cichlids','Pseudocrenilabrinae','Haplochromini','Haplochromis'], @bio_seq_2.classification) end def test_references assert_equal(2, @bio_seq_2.references.length) assert_equal(Bio::Reference, @bio_seq_2.references[0].class) end def test_features a assert_equal(3, @bio_seq_2.features.length) assert_equal(Bio::Feature, @bio_seq_2.features[0].class) end end end Index: test_embl_rel89.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/db/embl/test_embl_rel89.rb,v retrieving revision 1.2 retrieving revision 1.2.2.1 diff -C2 -d -r1.2 -r1.2.2.1 *** test_embl_rel89.rb 5 Apr 2007 23:35:43 -0000 1.2 --- test_embl_rel89.rb 20 Feb 2008 09:56:22 -0000 1.2.2.1 *************** *** 156,160 **** # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Bio::References, @obj.references.class) end --- 156,160 ---- # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Array, @obj.references.class) end *************** *** 169,173 **** def test_ft ! assert_equal(Bio::Features, @obj.ft.class) end --- 169,173 ---- def test_ft ! assert_equal(Array, @obj.ft.class) end *************** *** 179,183 **** def test_ft_accessor ! assert_equal('CDS', @obj.ft.features[1].feature) end --- 179,183 ---- def test_ft_accessor ! assert_equal('CDS', @obj.ft[1].feature) end Index: test_embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/db/embl/test_embl.rb,v retrieving revision 1.5 retrieving revision 1.5.2.1 diff -C2 -d -r1.5 -r1.5.2.1 *** test_embl.rb 5 Apr 2007 23:35:43 -0000 1.5 --- test_embl.rb 20 Feb 2008 09:56:22 -0000 1.5.2.1 *************** *** 151,155 **** # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Bio::References, @obj.references.class) end --- 151,155 ---- # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Array, @obj.references.class) end *************** *** 164,168 **** def test_ft ! assert_equal(Bio::Features, @obj.ft.class) end --- 164,168 ---- def test_ft ! assert_equal(Array, @obj.ft.class) end *************** *** 174,178 **** def test_ft_accessor ! assert_equal('CDS', @obj.ft.features[1].feature) end --- 174,178 ---- def test_ft_accessor ! assert_equal('CDS', @obj.ft[1].feature) end From aerts at dev.open-bio.org Wed Feb 20 08:54:21 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 13:54:21 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89.2.1,1.89.2.2 Message-ID: <200802201354.m1KDsL5F016175@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv16153 Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: Fixed bug in formatting features when exporting to EMBL. Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89.2.1 retrieving revision 1.89.2.2 diff -C2 -d -r1.89.2.1 -r1.89.2.2 *** bio.rb 20 Feb 2008 09:56:22 -0000 1.89.2.1 --- bio.rb 20 Feb 2008 13:54:19 -0000 1.89.2.2 *************** *** 280,284 **** class String def fold(width = 80) ! self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end --- 280,284 ---- class String def fold(width = 80) ! self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n").sub(/\n$/, '') end *************** *** 308,310 **** return result_string end ! end \ No newline at end of file --- 308,310 ---- return result_string end ! end From aerts at dev.open-bio.org Wed Feb 20 08:54:21 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 13:54:21 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.4,1.4.2.5 Message-ID: <200802201354.m1KDsLWx016180@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv16153/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: Fixed bug in formatting features when exporting to EMBL. Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.4 retrieving revision 1.4.2.5 diff -C2 -d -r1.4.2.4 -r1.4.2.5 *** format.rb 20 Feb 2008 09:56:22 -0000 1.4.2.4 --- format.rb 20 Feb 2008 13:54:19 -0000 1.4.2.5 *************** *** 47,51 **** --- 47,53 ---- end + result << "\n" result << format_qualifiers(feature.qualifiers, indent, width) + result << "\n" end return result *************** *** 62,66 **** lines = ('/' + q + '="' + v + '"').fold(width) else ! if v[/\D/] #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') --- 64,68 ---- lines = ('/' + q + '="' + v + '"').fold(width) else ! if ( v[/\D/] or q == 'chromosome' ) #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') *************** *** 72,76 **** lines.gsub!(/^/, indent) lines ! end.join end --- 74,78 ---- lines.gsub!(/^/, indent) lines ! end.join("\n") end From ngoto at dev.open-bio.org Wed Feb 20 12:04:49 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Wed, 20 Feb 2008 17:04:49 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24.2.1,1.24.2.2 Message-ID: <200802201704.m1KH4nCF017912@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv17810/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: Bio::References#new is added not to create Bio::References instances anymore. New transitional module Bio::References::BackwardCompatibilityForBioReferences is added to help keeping backward compatibility. (The only reason why not to erase Bio::References class is to load Marshal.dump data.) Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24.2.1 retrieving revision 1.24.2.2 diff -C2 -d -r1.24.2.1 -r1.24.2.2 *** reference.rb 18 Feb 2008 15:44:39 -0000 1.24.2.1 --- reference.rb 20 Feb 2008 17:04:47 -0000 1.24.2.2 *************** *** 580,587 **** --- 580,593 ---- # = DESCRIPTION # + # This class is OBSOLETED, and will soon be removed. + # Instead of this class, an array is to be used. + # + # # A container class for Bio::Reference objects. # # = USAGE # + # This class should NOT be used. + # # refs = Bio::References.new # refs.append(Bio::Reference.new(hash)) *************** *** 591,596 **** # class References ! include Enumerable ! # Array of Bio::Reference objects attr_accessor :references --- 597,638 ---- # class References ! ! # module to keep backward compatibility with obsoleted Bio::References ! module BackwardCompatibilityForBioReferences #:nodoc: ! ! # Backward compatibility with Bio::References#references. ! # Now, references are stored in an array, and ! # you should change your code not to use this method. ! def references ! warn 'Bio::References is obsoleted. Now, references are stored in an array.' ! self ! end ! ! # Backward compatibility with Bio::References#append. ! # Now, references are stored in an array, and ! # you should change your code not to use this method. ! def append(reference) ! warn 'Bio::References is obsoleted. Now, references are stored in an array.' ! self.push(reference) if reference.is_a? Reference ! self ! end ! end #module BackwardCompatibilityForBioReferences ! ! # This method should not be used. ! # Only for backward compatibility of existing code. ! # ! # Since Bio::References is obsoleted, ! # Bio::References.new not returns Bio::References object, ! # but modifies given _ary_ and returns the _ary_. ! # ! # *Arguments*: ! # * (optional) __: Array of Bio::Reference objects ! # *Returns*:: the given array ! def self.new(ary = []) ! warn 'Bio::References is obsoleted. Some methods are added to given array to keep backward compatibility.' ! ary.extend(BackwardCompatibilityForBioReferences) ! ary ! end ! # Array of Bio::Reference objects attr_accessor :references From ngoto at dev.open-bio.org Fri Feb 22 09:26:18 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 22 Feb 2008 14:26:18 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89.2.2,1.89.2.3 Message-ID: <200802221426.m1MEQI5W030582@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv30562 Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: reverted to 1.89 Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89.2.2 retrieving revision 1.89.2.3 diff -C2 -d -r1.89.2.2 -r1.89.2.3 *** bio.rb 20 Feb 2008 13:54:19 -0000 1.89.2.2 --- bio.rb 22 Feb 2008 14:26:16 -0000 1.89.2.3 *************** *** 278,310 **** end - class String - def fold(width = 80) - self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n").sub(/\n$/, '') - end - - def wrap(width = 80, prefix = '') - actual_width = width - prefix.length - result = [] - left = self.dup - while left and left.length > actual_width - line = nil - actual_width.downto(1) do |i| - if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then - line = left[0..(i-1)].sub(/ +\z/, '') - left = left[i..-1].sub(/\A +/, '') - break - end - end - if line.nil? then - line = left[0..(actual_width-1)] - left = left[actual_width..-1] - end - result << line - end - result << left if left - result_string = result.join("\n#{prefix}") - result_string = prefix + result_string unless result_string.empty? - # result_string << "\n" unless result_string.empty? - return result_string - end - end --- 278,279 ---- From ngoto at dev.open-bio.org Fri Feb 22 09:30:46 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 22 Feb 2008 14:30:46 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.5,1.4.2.6 Message-ID: <200802221430.m1MEUknT030652@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv30611 Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: * fold() and wrap() are reverted * Bug fix in format_features() and format_qualifiers() * The content of 'translate' qualifier is now wrapped by double quote Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.5 retrieving revision 1.4.2.6 diff -C2 -d -r1.4.2.5 -r1.4.2.6 *** format.rb 20 Feb 2008 13:54:19 -0000 1.4.2.5 --- format.rb 22 Feb 2008 14:30:44 -0000 1.4.2.6 *************** *** 31,34 **** --- 31,109 ---- # puts s.output(:embl) module Format + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the FASTA format string of the sequence. + # + # UNFORTUNATLY, the current implementation of Bio::Sequence is incapable of + # using either the header or width arguments. So something needs to be + # changed... + # + # Currently, this method is used in Bio::Sequence#output like so, + # + # s = Bio::Sequence.new('atgc') + # puts s.output(:fasta) #=> "> \natgc\n" + # --- + # *Arguments*: + # * (optional) _header_: String (default nil) + # * (optional) _width_: Fixnum (default nil) + # *Returns*:: String object + def format_fasta(header = nil, width = nil) + header ||= "#{@entry_id} #{@definition}" + + ">#{header}\n" + + if width + @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") + else + @seq.to_s + "\n" + end + end + + #--- + + # Not yet implemented :) + # Remove the nodoc command after implementation! + # --- + # *Returns*:: String object + #def format_gff #:nodoc: + # raise NotImplementedError + #end + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the Genbank format string of the sequence. + # Used in Bio::Sequence#output. + # --- + # *Returns*:: String object + #def format_genbank + # prefix = ' ' * 5 + # indent = prefix + ' ' * 16 + # fwidth = 79 - indent.length + # + # format_features(prefix, indent, fwidth) + #end + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the EMBL format string of the sequence. + # Used in Bio::Sequence#output. + # --- + # *Returns*:: String object + #def format_embl + # prefix = 'FT ' + # indent = prefix + ' ' * 16 + # fwidth = 80 - indent.length + # + # format_features(prefix, indent, fwidth) + #end + + #+++ + private *************** *** 42,53 **** head = '' ! (position).wrap(width).each_line do |line| result << head << line head = indent end - result << "\n" result << format_qualifiers(feature.qualifiers, indent, width) - result << "\n" end return result --- 117,126 ---- head = '' ! wrap(position, width).each_line do |line| result << head << line head = indent end result << format_qualifiers(feature.qualifiers, indent, width) end return result *************** *** 60,80 **** if v == true ! lines =('/' + q).wrap(width) elsif q == 'translation' ! lines = ('/' + q + '="' + v + '"').fold(width) else ! if ( v[/\D/] or q == 'chromosome' ) #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end ! lines = ('/' + q + '=' + v).wrap(width) end lines.gsub!(/^/, indent) lines ! end.join("\n") end end # Format --- 133,180 ---- if v == true ! lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold("/#{q}=\"#{v}\"", width) else ! if v[/\D/] or q == 'chromosome' #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end lines.gsub!(/^/, indent) lines ! end.join ! end ! ! def fold(str, width) ! str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end + def wrap(str, width) + result = [] + left = str.dup + while left and left.length > width + line = nil + width.downto(1) do |i| + if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then + line = left[0..(i-1)].sub(/ +\z/, '') + left = left[i..-1].sub(/\A +/, '') + break + end + end + if line.nil? then + line = left[0..(width-1)] + left = left[width..-1] + end + result << line + end + result << left if left + result_string = result.join("\n") + result_string << "\n" unless result_string.empty? + return result_string + end end # Format From ngoto at dev.open-bio.org Thu Feb 28 00:51:05 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 28 Feb 2008 05:51:05 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24.2.2,1.24.2.3 Message-ID: <200802280551.m1S5p5eX020471@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv20451 Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: @sequence_position should be nil if no information available Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24.2.2 retrieving revision 1.24.2.3 diff -C2 -d -r1.24.2.2 -r1.24.2.3 *** reference.rb 20 Feb 2008 17:04:47 -0000 1.24.2.2 --- reference.rb 28 Feb 2008 05:51:03 -0000 1.24.2.3 *************** *** 138,142 **** @mesh = hash['mesh'] @embl_gb_record_number = hash['embl_gb_record_number'] || nil ! @sequence_position = hash['sequence_position'] || [] @comments = hash['comments'] || [] @xrefs = hash['xrefs'] || [] --- 138,142 ---- @mesh = hash['mesh'] @embl_gb_record_number = hash['embl_gb_record_number'] || nil ! @sequence_position = hash['sequence_position'] || nil @comments = hash['comments'] || [] @xrefs = hash['xrefs'] || [] *************** *** 280,284 **** end end ! if @sequence_position != '' lines << "RP #{@sequence_position}" end --- 280,284 ---- end end ! if ! @sequence_position.nil? lines << "RP #{@sequence_position}" end From ngoto at dev.open-bio.org Thu Feb 28 00:54:53 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 28 Feb 2008 05:54:53 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/genbank common.rb,1.11,1.11.2.1 Message-ID: <200802280554.m1S5sr5Z020520@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/genbank In directory dev.open-bio.org:/tmp/cvs-serv20500/db/genbank Modified Files: Tag: BRANCH-biohackathon2008 common.rb Log Message: changed to parse sequence position and reference number in REFERENCES Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/genbank/common.rb,v retrieving revision 1.11 retrieving revision 1.11.2.1 diff -C2 -d -r1.11 -r1.11.2.1 *** common.rb 5 Apr 2007 23:35:40 -0000 1.11 --- common.rb 28 Feb 2008 05:54:51 -0000 1.11.2.1 *************** *** 141,144 **** --- 141,149 ---- subtag2array(ref).each do |field| case tag_get(field) + when /^\s*REFERENCE\s+(\d+)(\s+\(bases\s+(\d+)\s+to\s+(\d+)\))?/ + hash['embl_gb_record_number'] = $1.to_i + if $2 then + hash['sequence_position'] = "#{$3}-#{$4}" + end when /AUTHORS/ authors = truncate(tag_cut(field)) From pjotr at dev.open-bio.org Sat Feb 2 08:03:36 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 13:03:36 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.13,1.14 Message-ID: <200802021303.m12D3PNX031194@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31174 Modified Files: Tutorial.rd Log Message: Tabs in the Tutorial broke the rd parser - the Wiki will be fixed now. Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** Tutorial.rd 9 Jul 2007 12:28:07 -0000 1.13 --- Tutorial.rd 2 Feb 2008 13:03:23 -0000 1.14 *************** *** 1,2 **** --- 1,10 ---- + # This document is generated with a version of rd2html (part of Hiki) + # + # A possible test run could be from rdtool: + # + # ruby -I lib ./bin/rd2 ~/izip/cvs/opensource/bioruby/doc/Tutorial.rd + # + # A common problem is tabs in the text file! + =begin *************** *** 5,13 **** $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2007 Pjotr Prins, Naohisa Goto and others IMPORTANT NOTICE: This page is maintained in the BioRuby CVS --- 13,21 ---- $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2008 Pjotr Prins, Naohisa Goto and others IMPORTANT NOTICE: This page is maintained in the BioRuby CVS *************** *** 32,36 **** version it has with the ! % ruby -v command. Showing something like: --- 40,44 ---- version it has with the ! % ruby -v command. Showing something like: *************** *** 55,59 **** bioruby> puts seq atgcatgcaaaa ! bioruby> puts seq.complement ttttgcatgcat --- 63,67 ---- bioruby> puts seq atgcatgcaaaa ! bioruby> puts seq.complement ttttgcatgcat *************** *** 94,98 **** puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. --- 102,106 ---- puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. *************** *** 159,163 **** * Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the leftover subsequence shorter than 10000bp is also added --- 167,171 ---- * Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the leftover subsequence shorter than 10000bp is also added *************** *** 252,258 **** #!/usr/bin/env ruby ! require 'bio' ! ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| --- 260,266 ---- #!/usr/bin/env ruby ! require 'bio' ! ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| *************** *** 470,475 **** rebase = Bio::RestrictionEnzyme.rebase ! rebase.each do |enzyme_name, info| ! p enzyme_name end --- 478,483 ---- rebase = Bio::RestrictionEnzyme.rebase ! rebase.each do |enzyme_name, info| ! p enzyme_name end *************** *** 483,488 **** end end ! res.each do |frag| ! em = EnzymeMatch.new em.p_left = frag.p_left --- 491,496 ---- end end ! res.each do |frag| ! em = EnzymeMatch.new em.p_left = frag.p_left *************** *** 494,498 **** em.enzyme = ar_enz em.sequence = ar_seq ! p em end --- 502,506 ---- em.enzyme = ar_enz em.sequence = ar_seq ! p em end *************** *** 1160,1168 **** == Comparing BioProjects ! For a quick functional comparison of BioRuby, BioPerl, BioPython and Bioconductor (R) see (()) == Using BioRuby with R ! Using Ruby with R Pjotr wrote a section on SciRuby. See (()) == Using BioPerl or BioPython from Ruby --- 1168,1176 ---- == Comparing BioProjects ! For a quick functional comparison of BioRuby, BioPerl, BioPython and Bioconductor (R) see (()) == Using BioRuby with R ! Using Ruby with R Pjotr wrote a section on SciRuby. See (()) == Using BioPerl or BioPython from Ruby *************** *** 1182,1184 **** =end - --- 1190,1191 ---- From pjotr at dev.open-bio.org Sat Feb 2 09:02:03 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 14:02:03 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.14,1.15 Message-ID: <200802021401.m12E1uuN031293@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31273 Modified Files: Tutorial.rd Log Message: Updating tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** Tutorial.rd 2 Feb 2008 13:03:23 -0000 1.14 --- Tutorial.rd 2 Feb 2008 14:01:54 -0000 1.15 *************** *** 3,7 **** # A possible test run could be from rdtool: # ! # ruby -I lib ./bin/rd2 ~/izip/cvs/opensource/bioruby/doc/Tutorial.rd # # A common problem is tabs in the text file! --- 3,12 ---- # A possible test run could be from rdtool: # ! # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd ! # ! # or with style sheet: ! # ! # ruby -I lib ./bin/rd2 -r rd/rd2html-lib.rb --with-c ! ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # # A common problem is tabs in the text file! *************** *** 9,39 **** =begin ! See the document in the CVS repository ./doc/(()) - for a potentially more up-to-date edition. This one was updated: ! ! $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2008 Pjotr Prins, Naohisa Goto and others ! IMPORTANT NOTICE: This page is maintained in the BioRuby CVS ! repository. Please edit the file there otherwise changes may get ! lost. See (()) for CVS and mailing list ! access. ! = BioRuby Tutorial == Introduction ! This is a tutorial for using Bioruby. For BioRuby you need to install ! Ruby and the BioRuby package on your computer. For each following the ! instruction on the respective websites. (EDITOR's NOTE: include URL's) ! ! (EDITOR's NOTE: describe rdoc use for individual classes) ! For further information on the Ruby language see the section 'Further ! reading' at the end. You can check whether Ruby is installed on your computer and what --- 14,40 ---- =begin ! = BioRuby Tutorial ! Editor: PjotrPrins

! * Copyright (C) 2001-2003 KATAYAMA Toshiaki ! * Copyright (C) 2005-2008 Pjotr Prins, Naohisa Goto and others ! The latest version resides in the CVS repository ./doc/(()). This one was updated: ! $Id$ ! in preparation for the (()) == Introduction ! This is a tutorial for using Bioruby. A basic knowledge of Ruby is required. ! If you want to know more about the programming langauge Ruby we recommend the ! excellent book (()) ! by Dave Thomas and Andy Hunt - some of it is online ! (()). ! For BioRuby you need to install ! Ruby and the BioRuby package on your computer. You can check whether Ruby is installed on your computer and what *************** *** 46,49 **** --- 47,61 ---- ruby 1.8.5 (2006-08-25) [powerpc-linux] + If you see no such thing you'll have to install Ruby using your installation + manager. For more information see the + (()) website. + + Once Ruby is works download and install Bioruby using the links on the + (()) website. + + A lot of BioRuby's documentation exists in the source code and unit tests. To + really dive in you will need the latest source code tree. The embedded rdoc + documentation can be viewed online at + (()). But first lets start! == Trying Bioruby *************** *** 52,56 **** following command ! $BIORUBY/bin/bioruby and you should see a prompt --- 64,68 ---- following command ! ./bin/bioruby and you should see a prompt *************** *** 93,97 **** puts seq.translate # translation (Bio::Sequence::AA object) puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) p seq.translate.codes # shows three-letter codes (Array) --- 105,110 ---- puts seq.translate # translation (Bio::Sequence::AA object) puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 ! # (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) p seq.translate.codes # shows three-letter codes (Array) *************** *** 114,120 **** % ri File.open ! Nucleic acid sequence is an object of +Bio::Sequence::NA+ class, and ! amino acid sequence is an object of +Bio::Sequence::AA+ class. Shared ! methods are in the parent +Bio::Sequence+ class. As Bio::Sequence class inherits Ruby's String class, you can use --- 127,133 ---- % ri File.open ! Nucleic acid sequence is an object of Bio::Sequence::NA class, and ! amino acid sequence is an object of Bio::Sequence::AA class. Shared ! methods are in the parent Bio::Sequence class. As Bio::Sequence class inherits Ruby's String class, you can use *************** *** 297,303 **** end - (TRANSLATOR'S NOTE: Bio::DB.open have not been used so well.) - (EDITOR's NOTE: Test code) - Next, we are going to parse the GenBank 'features', which is normally very complicated: --- 310,313 ---- *************** *** 382,387 **** Databases in BioRuby are essentially accessed like that of GenBank ! with classes like Bio::GenBank, Bio::KEGG::GENES, ! (EDITOR's NOTE: include complete list) In many cases the Bio::DatabaseClass acts as a factory pattern --- 392,397 ---- Databases in BioRuby are essentially accessed like that of GenBank ! with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in ! the ./lib/bio/db directory of the BioRuby source tree. In many cases the Bio::DatabaseClass acts as a factory pattern *************** *** 1151,1160 **** == Further reading ! See the BioRuby in anger Wiki and the class documentation for more ! information on BioRuby. - The best book to get for understanding and getting productive with the - Ruby language is 'Programming Ruby' by Dave Thomas and Andy - Hunt. Strongly recommended! = APPENDIX --- 1161,1169 ---- == Further reading ! See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the ! source code and unit tests. To really dive in you will need the latest source ! code tree. The embedded rdoc documentation can be viewed online at ! (()). = APPENDIX *************** *** 1189,1191 **** --- 1198,1207 ---- carefully that come with each package. + == Modifying this page + + IMPORTANT NOTICE: This page is maintained in the BioRuby CVS + repository. Please edit the file there otherwise changes may get + lost. See (()) for CVS and mailing list + access. + =end From pjotr at dev.open-bio.org Sat Feb 2 09:15:19 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 14:15:19 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.15,1.16 Message-ID: <200802021415.m12EFAqB031346@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31326 Modified Files: Tutorial.rd Log Message: Modified tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** Tutorial.rd 2 Feb 2008 14:01:54 -0000 1.15 --- Tutorial.rd 2 Feb 2008 14:15:08 -0000 1.16 *************** *** 115,120 **** puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} ! p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. The p, print and puts methods are standard Ruby ways of outputting to --- 115,122 ---- puts seq.complement.translate # translation of complemental strand ! # reshuffle sequence with same frequencies: ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'), ! 'g'=>seq.count('g'),'t'=>seq.count('t')} ! p randomseq = Bio::Sequence::NA.randomize(counts) The p, print and puts methods are standard Ruby ways of outputting to *************** *** 265,269 **** print ">#{gb.accession} " # Accession puts gb.definition # Definition ! puts gb.naseq # Nucleic acid sequence (Bio::Sequence::NA object) end --- 267,272 ---- print ">#{gb.accession} " # Accession puts gb.definition # Definition ! puts gb.naseq # Nucleic acid sequence ! # (Bio::Sequence::NA object) end *************** *** 387,391 **** aaseq.splicing('21..119') - (EDITOR's NOTE: why use STRINGs here?) === More databases --- 390,393 ---- *************** *** 494,498 **** and cut a sequence with an enzyme follow up with: ! res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, {:view_ranges => true}) if res.kind_of? Symbol #error err = Err.find_by_code(res.to_s) --- 496,501 ---- and cut a sequence with an enzyme follow up with: ! res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, ! {:view_ranges => true}) if res.kind_of? Symbol #error err = Err.find_by_code(res.to_s) *************** *** 529,534 **** fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/). First, you must prepare your FASTA-formatted database sequence file ! target.pep and FASTA-formatted query.pep. (TRANSLATOR'S NOTE: I think ! we should provide sample data to readers.) #!/usr/bin/env ruby --- 532,536 ---- fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/). First, you must prepare your FASTA-formatted database sequence file ! target.pep and FASTA-formatted query.pep. #!/usr/bin/env ruby *************** *** 536,547 **** require 'bio' ! # Creates FASTA factory object ("ssearch" instead of "fasta34" can also work) factory = Bio::Fasta.local('fasta34', ARGV.pop) (EDITOR's NOTE: not consistent pop command) - # Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text) ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ! # Iterates over each entry. the variable "entry" is a Bio::FastaFormat object. ff.each do |entry| # shows definition line (begins with '>') to the standard error output --- 538,550 ---- require 'bio' ! # Creates FASTA factory object ("ssearch" instead of ! # "fasta34" can also work) factory = Bio::Fasta.local('fasta34', ARGV.pop) (EDITOR's NOTE: not consistent pop command) ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ! # Iterates over each entry. the variable "entry" is a ! # Bio::FastaFormat object: ff.each do |entry| # shows definition line (begins with '>') to the standard error output *************** *** 555,559 **** # If E-value is smaller than 0.0001 if hit.evalue < 0.0001 ! # shows identifier of query and hit, E-value, start and end positions of homologous region (TRANSLATOR'S NOTE: should I change Japanese document?) print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at --- 558,563 ---- # If E-value is smaller than 0.0001 if hit.evalue < 0.0001 ! # shows identifier of query and hit, E-value, start and ! # end positions of homologous region print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at *************** *** 569,573 **** FASTA many times easily. Instead of using Fasta#query method, Bio::Sequence#fasta method can be used. - (TRANSLATOR'S NOTE: Bio::Sequence#fasta are not so frequently used.) seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE" --- 573,576 ---- *************** *** 585,589 **** with the Report object. For example, getting information for hits: - report.each do |hit| puts hit.evalue # E-value --- 588,591 ---- *************** *** 594,606 **** puts hit.query_def # definition(comment line) of query sequence puts hit.query_len # length of query sequence ! puts hit.query_seq # query sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) puts hit.target_id # identifier of hit sequence puts hit.target_def # definition(comment line) of hit sequence puts hit.target_len # length of hit sequence ! puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of hit sequence) ! puts hit.query_start # start position of homologous region in query sequence ! puts hit.query_end # end position of homologous region in query sequence ! puts hit.target_start # start posiotion of homologous region in hit(target) sequence ! puts hit.target_end # end position of homologous region in hit(target) sequence puts hit.lap_at # array of above four numbers end --- 596,612 ---- puts hit.query_def # definition(comment line) of query sequence puts hit.query_len # length of query sequence ! puts hit.query_seq # sequence of homologous region puts hit.target_id # identifier of hit sequence puts hit.target_def # definition(comment line) of hit sequence puts hit.target_len # length of hit sequence ! puts hit.target_seq # hit of homologous region of hit sequence ! puts hit.query_start # start position of homologous ! # region in query sequence ! puts hit.query_end # end position of homologous region ! # in query sequence ! puts hit.target_start # start posiotion of homologous region ! # in hit(target) sequence ! puts hit.target_end # end position of homologous region ! # in hit(target) sequence puts hit.lap_at # array of above four numbers end *************** *** 695,717 **** report.each do |hit| ! puts hit.bit_score # bit score (*) ! puts hit.query_seq # query sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) ! puts hit.midline # middle line string of alignment of homologous region (*) ! puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) ! puts hit.evalue # E-value ! puts hit.identity # % identity ! puts hit.overlap # length of overlapping region ! puts hit.query_id # identifier of query sequence ! puts hit.query_def # definition(comment line) of query sequence ! puts hit.query_len # length of query sequence ! puts hit.target_id # identifier of hit sequence ! puts hit.target_def # definition(comment line) of hit sequence ! puts hit.target_len # length of hit sequence ! puts hit.query_start # start position of homologous region in query sequence ! puts hit.query_end # end position of homologous region in query sequence ! puts hit.target_start # start position of homologous region in hit(target) sequence ! puts hit.target_end # end position of homologous region in hit(target) sequence ! puts hit.lap_at # array of above four numbers end --- 701,723 ---- report.each do |hit| ! puts hit.bit_score ! puts hit.query_seq ! puts hit.midline ! puts hit.target_seq ! puts hit.evalue ! puts hit.identity ! puts hit.overlap ! puts hit.query_id ! puts hit.query_def ! puts hit.query_len ! puts hit.target_id ! puts hit.target_def ! puts hit.target_len ! puts hit.query_start ! puts hit.query_end ! puts hit.target_start ! puts hit.target_end ! puts hit.lap_at end *************** *** 1171,1175 **** == KEGG API ! Please refer to KEGG_API.rd.ja (TRANSLATOR'S NOTE: English version: (()) ) and * (()) --- 1177,1181 ---- == KEGG API ! Please refer to KEGG_API.rd.ja (English version: (()) ) and * (()) From pjotr at dev.open-bio.org Sun Feb 3 12:17:59 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sun, 03 Feb 2008 17:17:59 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.16,1.17 Message-ID: <200802031717.m13HHoa6015904@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv15881/doc Modified Files: Tutorial.rd Log Message: More doctests in Tutorial.rd Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** Tutorial.rd 2 Feb 2008 14:15:08 -0000 1.16 --- Tutorial.rd 3 Feb 2008 17:17:48 -0000 1.17 *************** *** 13,16 **** --- 13,17 ---- =begin + #doctest Testing bioruby = BioRuby Tutorial *************** *** 64,68 **** following command ! ./bin/bioruby and you should see a prompt --- 65,70 ---- following command ! ./bin/bioruby or ! ruby -I lib bin/bioruby and you should see a prompt *************** *** 73,80 **** bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! bioruby> puts seq ! atgcatgcaaaa ! bioruby> puts seq.complement ! ttttgcatgcat == Working with nucleic / amino acid sequences (Bio::Sequence class) --- 75,82 ---- bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! ==> "atgcatgcaaaa" ! ! bioruby> seq.complement ! ==> "ttttgcatgcat" == Working with nucleic / amino acid sequences (Bio::Sequence class) *************** *** 89,122 **** defined in codontable.rb). ! #!/usr/bin/env ruby ! ! require 'bio' ! ! seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! ! puts seq # original sequence ! puts seq.complement # complemental sequence (Bio::Sequence::NA object) ! puts seq.subseq(3,8) # gets subsequence of positions 3 to 8 ! p seq.gc_percent # GC percent (BioRuby 0.6.X: Float, BioRuby 0.7 or later: Integer) ! p seq.composition # nucleic acid compositions (Hash) ! puts seq.translate # translation (Bio::Sequence::AA object) ! puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 ! # (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) ! p seq.translate.codes # shows three-letter codes (Array) ! p seq.translate.names # shows amino acid names (Array) ! p seq.translate.composition # amino acid compositions (Hash) ! p seq.translate.molecular_weight # calculating molecular weight (Float) ! puts seq.complement.translate # translation of complemental strand - # reshuffle sequence with same frequencies: - counts = {'a'=>seq.count('a'),'c'=>seq.count('c'), - 'g'=>seq.count('g'),'t'=>seq.count('t')} - p randomseq = Bio::Sequence::NA.randomize(counts) The p, print and puts methods are standard Ruby ways of outputting to --- 91,136 ---- defined in codontable.rb). + bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") + ==> "atgcatgcaaaa" ! # complemental sequence (Bio::Sequence::NA object) ! bioruby> seq.complement ! ==> "ttttgcatgcat" ! bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 ! ==> "gcatgc" ! bioruby> seq.gc_percent ! ==> 33 ! bioruby> seq.composition ! ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} ! bioruby> seq.translate ! ==> "MHAK" ! bioruby> seq.translate(2) # translate from frame 2 ! ==> "CMQ" ! bioruby> seq.translate(1,11) # codon table 11 ! ==> "MHAK" ! bioruby> seq.translate.codes ! ==> ["Met", "His", "Ala", "Lys"] ! bioruby> seq.translate.names ! ==> ["methionine", "histidine", "alanine", "lysine"] ! bioruby> seq.translate.composition ! ==> {"K"=>1, "A"=>1, "M"=>1, "H"=>1} ! bioruby> seq.translate.molecular_weight ! ==> 485.605 ! bioruby> seq.complement.translate ! ==> "FCMH" ! get a random sequence with the same NA count: ! bioruby> counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} ! ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} ! bioruby!> randomseq = Bio::Sequence::NA.randomize(counts) ! ==!> "aaacatgaagtc" ! bioruby!> print counts ! a6c2g2t2 ! bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} The p, print and puts methods are standard Ruby ways of outputting to *************** *** 140,152 **** has index 0, for example: ! s = 'abc' ! puts s[0].chr ! ! >a ! ! puts s[0..1] ! ! >ab ! So when using String methods, you should subtract 1 from positions --- 154,163 ---- has index 0, for example: ! bioruby> s = 'abc' ! ==> "abc" ! bioruby> s[0].chr ! ==> "a" ! bioruby> s[0..1] ! ==> "ab" So when using String methods, you should subtract 1 from positions *************** *** 160,169 **** through a variable named +s+. ! * Shows average percentage of GC content for 100 bases (stepping ! the default one base at a time) ! seq.window_search(100) do |s| ! puts s.gc_percent ! end Since the class of each subsequence is the same as original sequence --- 171,182 ---- through a variable named +s+. ! * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ! ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! ! bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" ! Since the class of each subsequence is the same as original sequence *************** *** 1165,1168 **** --- 1178,1192 ---- included - with output) + == Unit testing and doctests + + BioRuby comes with an extensive testing framework with over 1300 tests and 2700 + assertions. To run the unit tests: + + cd test + ruby runner.rb + + We have also started with doctest for Ruby. We are porting the examples + in this tutorial to doctest - more info upcoming. + == Further reading From pjotr at dev.open-bio.org Tue Feb 5 07:01:26 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Tue, 05 Feb 2008 12:01:26 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.17,1.18 Message-ID: <200802051201.m15C1JTf032112@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv32092/doc Modified Files: Tutorial.rd Log Message: Minor tweak to Tutorial.rd Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** Tutorial.rd 3 Feb 2008 17:17:48 -0000 1.17 --- Tutorial.rd 5 Feb 2008 12:01:16 -0000 1.18 *************** *** 129,135 **** bioruby!> print counts ! a6c2g2t2 bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} --- 129,135 ---- bioruby!> print counts ! a6c2g2t2 bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} *************** *** 173,183 **** * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" - Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can --- 173,182 ---- * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can From pjotr at dev.open-bio.org Tue Feb 5 07:11:20 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Tue, 05 Feb 2008 12:11:20 -0000 Subject: [BioRuby-cvs] bioruby/sample gb2fasta.rb,0.5,0.6 Message-ID: <200802051211.m15CBDam032291@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/sample In directory dev.open-bio.org:/tmp/cvs-serv32271/sample Modified Files: gb2fasta.rb Log Message: Fixed broken require in gb2fasta example Index: gb2fasta.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/sample/gb2fasta.rb,v retrieving revision 0.5 retrieving revision 0.6 diff -C2 -d -r0.5 -r0.6 *** gb2fasta.rb 23 Jul 2002 04:51:24 -0000 0.5 --- gb2fasta.rb 5 Feb 2008 12:11:11 -0000 0.6 *************** *** 19,24 **** # ! require 'bio/io/flatfile' ! require 'bio/db/genbank' include Bio --- 19,23 ---- # ! require 'bio' include Bio From pjotr at dev.open-bio.org Wed Feb 6 11:26:05 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Wed, 06 Feb 2008 16:26:05 -0000 Subject: [BioRuby-cvs] bioruby/sample na2aa.rb,NONE,1.1 Message-ID: <200802061625.m16GPuIu005441@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/sample In directory dev.open-bio.org:/tmp/cvs-serv5421 Added Files: na2aa.rb Log Message: Simple example to translate any NA to AA fasta --- NEW FILE: na2aa.rb --- #!/usr/bin/env ruby # # translate.rb - translate any NA input into AA FASTA format # # Copyright (C) 2008 KATAYAMA Toshiaki & Pjotr Prins # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: na2aa.rb,v 1.1 2008/02/06 16:25:53 pjotr Exp $ # require 'bio' require 'pp' include Bio ARGV.each do | fn | Bio::FlatFile.auto(fn).each do | item | seq = Sequence::NA.new(item.data) aa = seq.translate aa.gsub!(/X/,'-') rec = Bio::FastaFormat.new('> '+item.definition+"\n"+aa) print rec end end From pjotr at dev.open-bio.org Mon Feb 11 02:08:56 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Mon, 11 Feb 2008 07:08:56 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.18,1.19 Message-ID: <200802110708.m1B78mwU007283@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv7263/doc Modified Files: Tutorial.rd Log Message: Expanding on the Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** Tutorial.rd 5 Feb 2008 12:01:16 -0000 1.18 --- Tutorial.rd 11 Feb 2008 07:08:46 -0000 1.19 *************** *** 1,5 **** # This document is generated with a version of rd2html (part of Hiki) # ! # A possible test run could be from rdtool: # # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd --- 1,5 ---- # This document is generated with a version of rd2html (part of Hiki) # ! # A possible test run could be from rdtool (on Debian package rdtool) # # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd *************** *** 10,14 **** ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # ! # A common problem is tabs in the text file! =begin --- 10,23 ---- ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # ! # in Debian: ! # ! # rd2 -r rd/rd2html-lib --with-css="/home/wrk/izip/cvs/opensource/bioruby/lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.css" Tutorial.rd > index.html ! # ! # A common problem is tabs in the text file! TABs are not allowed. ! # ! # To add tests run Toshiaki's bioruby shell and paste in the query plus ! # results. ! # ! # To run the embedded Ruby doctests you can get the doctest.rb from Pjotr. =begin *************** *** 36,41 **** (()). ! For BioRuby you need to install ! Ruby and the BioRuby package on your computer. You can check whether Ruby is installed on your computer and what --- 45,49 ---- (()). ! For BioRuby you need to install Ruby and the BioRuby package on your computer You can check whether Ruby is installed on your computer and what *************** *** 80,83 **** --- 88,95 ---- ==> "ttttgcatgcat" + See the the Bioruby shell section below for more tweaking. If you have trouble running + examples also check the section below on trouble shooting. You can also post a + question to the mailing list. BioRuby developers usually try to help. + == Working with nucleic / amino acid sequences (Bio::Sequence class) *************** *** 171,181 **** through a variable named +s+. ! * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" Since the class of each subsequence is the same as original sequence --- 183,195 ---- through a variable named +s+. ! * Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! bioruby> a=[]; seq.window_search(20) { |s| a.push s.gc_percent } ! bioruby> a ! ==> [30, 35, 40, 40, 35, 35, 35, 30, 25, 30, 30, 30, 35, 35, 35, 35, 35, 40, 45, 45, 45, 45, 40, 35, 40, 40, 40, 40, 40, 35, 35, 35, 30, 30, 30] ! Since the class of each subsequence is the same as original sequence *************** *** 185,191 **** * Shows translation results for 15 bases shifting a codon at a time ! seq.window_search(15, 3) do |s| ! puts s.translate ! end Finally, the window_search method returns the last leftover --- 199,209 ---- * Shows translation results for 15 bases shifting a codon at a time ! bioruby> a = [] ! bioruby> seq.window_search(15, 3) do |s| ! bioruby> a.push s.translate ! bioruby> end ! bioruby> a ! ==> ["MHAIK", "HAIKL", "AIKLI", "IKLIP", "KLIPI", "LIPIR", "IPIRS", "PIRSS", "IRSSR", "RSSRS", "SSRSS", "SRSSK", "RSSKK", "SSKKK"] ! Finally, the window_search method returns the last leftover *************** *** 193,206 **** * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the ! leftover subsequence shorter than 10000bp is also added i = 1 remainder = seq.window_search(10000, 9000) do |s| ! puts s.to_fasta("segment #{i}", 60) i += 1 end ! puts remainder.to_fasta("segment #{i}", 60) If you don't want the overlapping window, set window size and stepping --- 211,227 ---- * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 + textwidth=60 remainder = seq.window_search(10000, 9000) do |s| ! puts s.to_fasta("segment #{i}", textwidth) i += 1 end ! if remainder ! puts remainder.to_fasta("segment #{i}", textwidth) ! end If you don't want the overlapping window, set window size and stepping *************** *** 211,224 **** * Count the codon usage ! codon_usage = Hash.new(0) ! seq.window_search(3, 3) do |s| ! codon_usage[s] += 1 ! end * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) ! seq.window_search(10, 10) do |s| ! puts s.molecular_weight ! end In most cases, sequences are read from files or retrieved from databases. --- 232,251 ---- * Count the codon usage ! bioruby> codon_usage = Hash.new(0) ! bioruby> seq.window_search(3, 3) do |s| ! bioruby> codon_usage[s] += 1 ! bioruby> end ! bioruby> codon_usage ! ==> {"cat"=>1, "aaa"=>3, "cca"=>1, "att"=>2, "aga"=>1, "atc"=>1, "cta"=>1, "gca"=>1, "cga"=>1, "tca"=>3, "aag"=>1, "tcc"=>1, "atg"=>1} ! * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) ! bioruby> a = [] ! bioruby> seq.window_search(10, 10) do |s| ! bioruby> a.push s.molecular_weight ! bioruby> end ! bioruby> a ! ==> [3096.2062, 3086.1962, 3056.1762, 3023.1262, 3073.2262] In most cases, sequences are read from files or retrieved from databases. *************** *** 246,249 **** --- 273,280 ---- % ruby na2aa.rb my_naseq.txt + or use a pipe! + + % cat my_naseq.txt|ruby na2aa.rb + Outputs *************** *** 254,259 **** % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt ! In the next section we will retrieve data from databases instead of ! using raw sequence files. == Parsing GenBank data (Bio::GenBank class) --- 285,291 ---- % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt ! In the next section we will retrieve data from databases instead of using raw ! sequence files. One generic example of the above can be found in ! ./sample/na2aa.rb. == Parsing GenBank data (Bio::GenBank class) *************** *** 460,474 **** Array and BioPerl's Bio::SimpleAlign. A very simple example is: ! require 'bio' ! ! seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] ! seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } ! # creates alignment object ! a = Bio::Alignment.new(seqs) ! ! # shows consensus sequence ! p a.consensus # ==> "a?gc?" ! # shows IUPAC consensus p a.consensus_iupac # ==> "ahgcr" --- 492,501 ---- Array and BioPerl's Bio::SimpleAlign. A very simple example is: ! bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] ! bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } # creates alignment object ! bioruby> a = Bio::Alignment.new(seqs) ! bioruby> a.consensus ! ==> "xa?gc?" # shows IUPAC consensus p a.consensus_iupac # ==> "ahgcr" *************** *** 1168,1179 **** == The BioRuby example programs ! Some sample programs are stored in samples/ directry. ! Some programs are obsolete. Since samples are not enough, ! practical and interesting samples are welcome. ! ! to be written... ! (EDITOR's NOTE: I would like some examples automatically ! included - with output) == Unit testing and doctests --- 1195,1201 ---- == The BioRuby example programs ! Some sample programs are stored in ./samples/ directory. Run for example: ! ./sample/na2aa.rb test/data/fasta/example1.txt == Unit testing and doctests *************** *** 1195,1198 **** --- 1217,1242 ---- (()). + == BioRuby Shell + + The BioRuby shell implementation you find in ./lib/bio/shell. It is very interesting + as it uses IRB (the Ruby intepreter) which is a powerful environment described in + (()). IRB commands can directly be typed in the shell, e.g. + + bioruby!> IRB.conf[:PROMPT_MODE] + ==!> :PROMPT_C + + optionally you also may want to install the optional Ruby readline support - + with Debian libreadline-ruby. To edit a previous line you may have to press + line down (arrow down) first. + + = Helpful tools + + Apart from rdoc you may also want to use rtags - which allows jumping around + source code by clicking on class and method names. + + cd bioruby/lib + rtags -R --vi + + For a tutorial see (()) = APPENDIX *************** *** 1227,1230 **** --- 1271,1283 ---- carefully that come with each package. + == Trouble shooting + + * Error: in `require': no such file to load -- bio (LoadError) + + Ruby fails to find the BioRuby libraries - add it to the RUBYLIB path, or pass + it to the interpeter. For example: + + ruby -I~/cvs/bioruby/lib yourprogram.rb + == Modifying this page From pjotr at dev.open-bio.org Mon Feb 11 03:03:36 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Mon, 11 Feb 2008 08:03:36 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.19,1.20 Message-ID: <200802110803.m1B83TYu007417@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv7397 Modified Files: Tutorial.rd Log Message: Minor adjustments to Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.19 retrieving revision 1.20 diff -C2 -d -r1.19 -r1.20 *** Tutorial.rd 11 Feb 2008 07:08:46 -0000 1.19 --- Tutorial.rd 11 Feb 2008 08:03:27 -0000 1.20 *************** *** 497,519 **** bioruby> a = Bio::Alignment.new(seqs) bioruby> a.consensus ! ==> "xa?gc?" # shows IUPAC consensus ! p a.consensus_iupac # ==> "ahgcr" ! # iterates over each seq a.each { |x| p x } ! # ==> ! # "atgca" ! # "aagca" ! # "acgca" ! # "acgcg" # iterates over each site a.each_site { |x| p x } ! # ==> ! # ["a", "a", "a", "a"] ! # ["t", "a", "c", "c"] ! # ["g", "g", "g", "g"] ! # ["c", "c", "c", "c"] ! # ["a", "a", "a", "g"] # doing alignment by using CLUSTAL W. --- 497,519 ---- bioruby> a = Bio::Alignment.new(seqs) bioruby> a.consensus ! ==> "a?gc?" # shows IUPAC consensus ! a.consensus_iupac ! ==> "ahgcr" # iterates over each seq a.each { |x| p x } ! # ==> ! # "atgca" ! # "aagca" ! # "acgca" ! # "acgcg" # iterates over each site a.each_site { |x| p x } ! # ==> ! # ["a", "a", "a", "a"] ! # ["t", "a", "c", "c"] ! # ["g", "g", "g", "g"] ! # ["c", "c", "c", "c"] ! # ["a", "a", "a", "g"] # doing alignment by using CLUSTAL W. From pjotr at dev.open-bio.org Wed Feb 13 03:04:41 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Wed, 13 Feb 2008 08:04:41 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.20,1.21 Message-ID: <200802130804.m1D84XQC015600@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv15580 Modified Files: Tutorial.rd Log Message: Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** Tutorial.rd 11 Feb 2008 08:03:27 -0000 1.20 --- Tutorial.rd 13 Feb 2008 08:04:30 -0000 1.21 *************** *** 183,187 **** through a variable named +s+. ! * Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") --- 183,187 ---- through a variable named +s+. ! Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") *************** *** 197,201 **** use all methods on the subsequence. For example, ! * Shows translation results for 15 bases shifting a codon at a time bioruby> a = [] --- 197,201 ---- use all methods on the subsequence. For example, ! Shows translation results for 15 bases shifting a codon at a time bioruby> a = [] *************** *** 210,217 **** subsequence. This allows for example ! * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 --- 210,217 ---- subsequence. This allows for example ! Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 *************** *** 230,234 **** Other examples ! * Count the codon usage bioruby> codon_usage = Hash.new(0) --- 230,234 ---- Other examples ! Count the codon usage bioruby> codon_usage = Hash.new(0) *************** *** 240,244 **** ! * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) bioruby> a = [] --- 240,244 ---- ! Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) bioruby> a = [] *************** *** 399,408 **** end ! * Note: In this example Feature#assoc method makes a Hash from a ! feature object. It is useful because you can get data from the hash ! by using qualifiers as keys. ! (But there is a risk some information is lost when two or more ! qualifiers are the same. Therefore an Array is returned by ! Feature#feature) Bio::Sequence#splicing splices subsequence from nucleic acid sequence --- 399,408 ---- end ! Note: In this example Feature#assoc method makes a Hash from a ! feature object. It is useful because you can get data from the hash ! by using qualifiers as keys. ! (But there is a risk some information is lost when two or more ! qualifiers are the same. Therefore an Array is returned by ! Feature#feature) Bio::Sequence#splicing splices subsequence from nucleic acid sequence *************** *** 418,426 **** bio/location.rb. ! * Splice according to location string used in a GenBank entry naseq.splicing('join(2035..2050,complement(1775..1818),13..345') ! * Generate Bio::Locations object and pass the splicing method locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') --- 418,426 ---- bio/location.rb. ! Splice according to location string used in a GenBank entry naseq.splicing('join(2035..2050,complement(1775..1818),13..345') ! Generate Bio::Locations object and pass the splicing method locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') *************** *** 430,434 **** (Bio::Sequence::AA objects). ! * Splicing peptide from a protein (e.g. signal peptide) aaseq.splicing('21..119') --- 430,434 ---- (Bio::Sequence::AA objects). ! Splicing peptide from a protein (e.g. signal peptide) aaseq.splicing('21..119') From k at dev.open-bio.org Sat Feb 2 03:36:00 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Sat, 02 Feb 2008 03:36:00 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.80,1.81 Message-ID: <200802020336.m123a0gr029664@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv29660 Modified Files: ChangeLog Log Message: * lib/bio/shell/rails/vendor/plugins/ The 'generators' directory is moved under the 'bioruby' subdirectory so that 'bioruby --rails' command can work with Rails 2.x series in addition to the Rails 1.2.x series. Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.80 retrieving revision 1.81 diff -C2 -d -r1.80 -r1.81 *** ChangeLog 9 Jan 2008 17:22:39 -0000 1.80 --- ChangeLog 2 Feb 2008 03:35:58 -0000 1.81 *************** *** 1,4 **** --- 1,12 ---- 2008-01-10 Toshiaki Katayama + * lib/bio/shell/rails/vendor/plugins/ + + The 'generators' directory is moved under the 'bioruby' subdirectory + so that 'bioruby --rails' command can work with Rails 2.x series + in addition to the Rails 1.2.x series. + + 2008-01-10 Toshiaki Katayama + * lib/bio/io/hinv.rb From k at dev.open-bio.org Sat Feb 2 03:54:50 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Sat, 02 Feb 2008 03:54:50 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.81,1.82 Message-ID: <200802020354.m123soGS029686@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv29682 Modified Files: ChangeLog Log Message: fix typo Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.81 retrieving revision 1.82 diff -C2 -d -r1.81 -r1.82 *** ChangeLog 2 Feb 2008 03:35:58 -0000 1.81 --- ChangeLog 2 Feb 2008 03:54:48 -0000 1.82 *************** *** 1,3 **** ! 2008-01-10 Toshiaki Katayama * lib/bio/shell/rails/vendor/plugins/ --- 1,3 ---- ! 2008-02-02 Toshiaki Katayama * lib/bio/shell/rails/vendor/plugins/ From ngoto at dev.open-bio.org Tue Feb 12 02:13:34 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Tue, 12 Feb 2008 02:13:34 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/appl/blast format0.rb,1.25,1.26 Message-ID: <200802120213.m1C2DX5m009903@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/appl/blast In directory dev.open-bio.org:/tmp/cvs-serv9861/lib/bio/appl/blast Modified Files: format0.rb Log Message: * Bug fixes reported by Shuji Shigenobu. * Failed to parse query length for long query (>= 10000 letters) because comma is inserted for digit separator by blastall. * Failed to parse e-value for some BLASTX results Index: format0.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/appl/blast/format0.rb,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** format0.rb 27 Dec 2007 17:28:57 -0000 1.25 --- format0.rb 12 Feb 2008 02:13:31 -0000 1.26 *************** *** 265,270 **** q << sc.scan(/.*/) sc.skip(/\s*^ ?/) ! end until !sc.rest or r = sc.skip(/ *\( *(\d+) *letters *\)\s*\z/) ! @query_len = sc[1].to_i if r @query_def = q.join(' ') end --- 265,270 ---- q << sc.scan(/.*/) sc.skip(/\s*^ ?/) ! end until !sc.rest or r = sc.skip(/ *\( *([\,\d]+) *letters *\)\s*\z/) ! @query_len = sc[1].delete(',').to_i if r @query_def = q.join(' ') end *************** *** 969,973 **** while sc.rest? sc.skip(/\s*/) ! if sc.skip(/Expect(?:\(\d\))? *\= *([e\-\.\d]+)/) then ev = sc[1].to_s ev = '1' + ev if ev[0] == ?e --- 969,973 ---- while sc.rest? sc.skip(/\s*/) ! if sc.skip(/Expect(?:\(\d+\))? *\= *([e\-\.\d]+)/) then ev = sc[1].to_s ev = '1' + ev if ev[0] == ?e From ngoto at dev.open-bio.org Tue Feb 12 05:32:25 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Tue, 12 Feb 2008 05:32:25 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.82,1.83 Message-ID: <200802120532.m1C5WP5M011183@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv11163 Modified Files: ChangeLog Log Message: ChangeLog for lib/bio/appl/blast/format0.rb from 1.25 to 1.26. Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.82 retrieving revision 1.83 diff -C2 -d -r1.82 -r1.83 *** ChangeLog 2 Feb 2008 03:54:48 -0000 1.82 --- ChangeLog 12 Feb 2008 05:32:23 -0000 1.83 *************** *** 1,2 **** --- 1,11 ---- + 2008-02-12 Naohisa Goto + + * lib/bio/appl/blast/format0.rb + + Fixed bugs: Failed to parse query length for long query + (>= 10000 letters) as comma is inserted for digit separator + by blastall; Failed to parse e-value for some BLASTX results. + Thanks to Shuji Shigenobu who reported the bugs and sent patches. + 2008-02-02 Toshiaki Katayama From ngoto at dev.open-bio.org Wed Feb 13 10:28:20 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Wed, 13 Feb 2008 10:28:20 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58,0.58.2.1 Message-ID: <200802131028.m1DASKHe017196@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv17175/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: Added a new class method Bio::Sequence.read(str). Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58 retrieving revision 0.58.2.1 diff -C2 -d -r0.58 -r0.58.2.1 *** sequence.rb 5 Apr 2007 23:35:39 -0000 0.58 --- sequence.rb 13 Feb 2008 10:28:16 -0000 0.58.2.1 *************** *** 334,337 **** --- 334,356 ---- @moltype = AA end + + # Create a new Bio::Sequence object from a formatted string + # (GenBank, EMBL, fasta format, etc.) + # + # s = Bio::Sequence.read(str) + # --- + # *Arguments*: + # * (required) _str_: string + # * (optional) _format_: format specification (class or nil) + # *Returns*:: Bio::Sequence object + def self.read(str, format = nil) + if format then + klass = format + else + klass = Bio::FlatFile::AutoDetect.default.guess(str) + end + obj = klass.new(str) + obj.to_biosequence + end end # Sequence From ngoto at dev.open-bio.org Thu Feb 14 03:13:48 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:13:48 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4,1.4.2.1 Message-ID: <200802140313.m1E3Dm2s019722@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv19681/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: * lib/bio/sequence.rb * changed to include Format module * lib/bio/sequence/format.rb * fixed bug: incorrect refactoring Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4 retrieving revision 1.4.2.1 diff -C2 -d -r1.4 -r1.4.2.1 *** format.rb 5 Apr 2007 23:35:41 -0000 1.4 --- format.rb 14 Feb 2008 03:13:46 -0000 1.4.2.1 *************** *** 18,23 **** module Bio - autoload :Sequence, 'bio/sequence' - class Sequence --- 18,21 ---- *************** *** 127,131 **** def format_qualifiers(qualifiers, indent, width) ! qualifiers.each do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s --- 125,129 ---- def format_qualifiers(qualifiers, indent, width) ! qualifiers.collect do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s *************** *** 134,138 **** lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + val, width) else if v[/\D/] --- 132,136 ---- lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + v, width) else if v[/\D/] *************** *** 141,149 **** v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + val, width) end ! return lines.gsub(/^/, indent) ! end end --- 139,148 ---- v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end ! lines.gsub!(/^/, indent) ! lines ! end.join end From ngoto at dev.open-bio.org Thu Feb 14 03:13:48 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:13:48 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.1,0.58.2.2 Message-ID: <200802140313.m1E3DmsN019717@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv19681/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: * lib/bio/sequence.rb * changed to include Format module * lib/bio/sequence/format.rb * fixed bug: incorrect refactoring Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.1 retrieving revision 0.58.2.2 diff -C2 -d -r0.58.2.1 -r0.58.2.2 *** sequence.rb 13 Feb 2008 10:28:16 -0000 0.58.2.1 --- sequence.rb 14 Feb 2008 03:13:46 -0000 0.58.2.2 *************** *** 71,74 **** --- 71,75 ---- autoload :Generic, 'bio/sequence/generic' autoload :Format, 'bio/sequence/format' + include Format # Create a new Bio::Sequence object *************** *** 149,153 **** # *Returns*:: String object def output(style) - extend Bio::Sequence::Format case style when :fasta --- 150,153 ---- From ngoto at dev.open-bio.org Thu Feb 14 03:32:16 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 03:32:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.1,1.4.2.2 Message-ID: <200802140332.m1E3WGAu019905@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv19885/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: in wrap(), the last "\n" should be added for non-empty string Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.1 retrieving revision 1.4.2.2 diff -C2 -d -r1.4.2.1 -r1.4.2.2 *** format.rb 14 Feb 2008 03:13:46 -0000 1.4.2.1 --- format.rb 14 Feb 2008 03:32:14 -0000 1.4.2.2 *************** *** 170,174 **** end result << left if left ! return result.join("\n") end --- 170,176 ---- end result << left if left ! result_string = result.join("\n") ! result_string << "\n" unless result_string.empty? ! return result_string end From ngoto at dev.open-bio.org Thu Feb 14 08:51:47 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 14 Feb 2008 08:51:47 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/genbank genbank.rb,0.40,0.40.2.1 Message-ID: <200802140851.m1E8plsw023607@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/genbank In directory dev.open-bio.org:/tmp/cvs-serv23587/lib/bio/db/genbank Modified Files: Tag: BRANCH-biohackathon2008 genbank.rb Log Message: added new method Bio::GenBank#to_biosequence. Index: genbank.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/genbank/genbank.rb,v retrieving revision 0.40 retrieving revision 0.40.2.1 diff -C2 -d -r0.40 -r0.40.2.1 *** genbank.rb 5 Apr 2007 23:35:40 -0000 0.40 --- genbank.rb 14 Feb 2008 08:51:45 -0000 0.40.2.1 *************** *** 126,129 **** --- 126,157 ---- end + # converts Bio::GenBank to Bio::Sequence + # --- + # *Arguments*: + # *Returns*:: Bio::Sequence object + def to_biosequence + sequence = Bio::Sequence.new(seq) + sequence.entry_id = self.entry_id + + sequence.primary_accession = self.accession + sequence.secondary_accessions = self.accessions - [ self.accession ] + + sequence.molecule_type = self.natype + sequence.division = self.division + sequence.topology = self.circular + + sequence.sequence_version = self.version + seq.date_created = nil #???? + sequence.date_modified = self.date + + sequence.keywords = self.keywords + sequence.species = self.organism + sequence.classification = self.taxonomy + sequence.organnella = nil # not used + sequence.comments = self.comment + sequence.references = self.references + return sequence + end + end # GenBank end # Bio From ngoto at dev.open-bio.org Fri Feb 15 02:18:24 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 02:18:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.2,1.4.2.3 Message-ID: <200802150218.m1F2IOnH025723@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv25703/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: special character in regexp should be escaped Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.2 retrieving revision 1.4.2.3 diff -C2 -d -r1.4.2.2 -r1.4.2.3 *** format.rb 14 Feb 2008 03:32:14 -0000 1.4.2.2 --- format.rb 15 Feb 2008 02:18:21 -0000 1.4.2.3 *************** *** 157,161 **** line = nil width.downto(1) do |i| ! if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') --- 157,161 ---- line = nil width.downto(1) do |i| ! if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') From ngoto at dev.open-bio.org Fri Feb 15 03:23:25 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 03:23:25 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.2,0.58.2.3 Message-ID: <200802150323.m1F3NP8b025922@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv25902/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: In Bio::Sequence#method_missing, __send__ should be used instead of send. When method is not found, error message is modified if it is caused by method_missing. Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.2 retrieving revision 0.58.2.3 diff -C2 -d -r0.58.2.2 -r0.58.2.3 *** sequence.rb 14 Feb 2008 03:13:46 -0000 0.58.2.2 --- sequence.rb 15 Feb 2008 03:23:23 -0000 0.58.2.3 *************** *** 97,101 **** # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: ! @seq.send(sym, *args, &block) end --- 97,119 ---- # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: ! begin ! @seq.__send__(sym, *args, &block) ! rescue NoMethodError => evar ! lineno = __LINE__ - 2 ! file = __FILE__ ! bt_here = [ "#{file}:#{lineno}:in \`__send__\'", ! "#{file}:#{lineno}:in \`method_missing\'" ! ] ! if bt_here == evar.backtrace[0, 2] then ! bt = evar.backtrace[2..-1] ! evar = NoMethodError.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") ! evar.set_backtrace(bt) ! end ! #p lineno ! #p file ! #p bt_here ! #p evar.backtrace ! raise(evar) ! end end From ngoto at dev.open-bio.org Fri Feb 15 03:33:53 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 03:33:53 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.3,0.58.2.4 Message-ID: <200802150333.m1F3Xr5w025971@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv25951/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: changed to use original exception class instead of NoMethodError Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.3 retrieving revision 0.58.2.4 diff -C2 -d -r0.58.2.3 -r0.58.2.4 *** sequence.rb 15 Feb 2008 03:23:23 -0000 0.58.2.3 --- sequence.rb 15 Feb 2008 03:33:51 -0000 0.58.2.4 *************** *** 107,111 **** if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] ! evar = NoMethodError.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end --- 107,111 ---- if bt_here == evar.backtrace[0, 2] then bt = evar.backtrace[2..-1] ! evar = evar.class.new("undefined method \`#{sym.to_s}\' for #{self.inspect}") evar.set_backtrace(bt) end From aerts at dev.open-bio.org Fri Feb 15 04:49:39 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Fri, 15 Feb 2008 04:49:39 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl embl.rb,1.29,1.29.2.1 Message-ID: <200802150449.m1F4ndLY026633@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv26608/db/embl Modified Files: Tag: BRANCH-biohackathon2008 embl.rb Log Message: Added functionality to convert a Bio::EMBL object into a full-blown Bio::Sequence object that contains features, references and other additional information. Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.29 retrieving revision 1.29.2.1 diff -C2 -d -r1.29 -r1.29.2.1 *** embl.rb 5 Apr 2007 23:35:40 -0000 1.29 --- embl.rb 15 Feb 2008 04:49:37 -0000 1.29.2.1 *************** *** 3,7 **** # # ! # Copyright:: Copyright (C) 2001-2007 Mitsuteru C. Nakao # License:: The Ruby License # --- 3,9 ---- # # ! # Copyright:: Copyright (C) 2001-2007 ! # Mitsuteru C. Nakao ! # Jan Aerts # License:: The Ruby License # *************** *** 121,124 **** --- 123,130 ---- alias molecule_type molecule + def topology + id_line('TOPOLOGY') + end + # returns DIVISION in the ID line. # * Bio::EMBL#division -> String *************** *** 222,227 **** # # Bio::EMBLDB#ref ! ! ## # DR Line; defabases cross-regerence (>=0) --- 228,233 ---- # # Bio::EMBLDB#ref ! ! ## # DR Line; defabases cross-regerence (>=0) *************** *** 356,366 **** # bb Line; (blanks) sequence data (>=1) def seq ! Sequence::NA.new( fetch('').gsub(/ /,'').gsub(/\d+/,'') ) end alias naseq seq alias ntseq seq ! # // Line; termination line (end; 1/entry) ### private methods --- 362,392 ---- # bb Line; (blanks) sequence data (>=1) def seq ! Bio::Sequence::NA.new( fetch('').gsub(/ /,'').gsub(/\d+/,'') ) end alias naseq seq alias ntseq seq ! # // Line; termination line (end; 1/entry) + def to_biosequence + bio_seq = Bio::Sequence.new(self.seq) + bio_seq.entry_id = self.entry_id + bio_seq.primary_accession = self.accessions[0] + bio_seq.secondary_accessions = self.accessions[1,-1] + bio_seq.molecule_type = self.molecule_type + bio_seq.definition = self.description + bio_seq.topology = self.topology + bio_seq.date_created = self.dt['created'] + bio_seq.date_modified = self.dt['updated'] + bio_seq.division = self.division + bio_seq.sequence_version = self.version + bio_seq.keywords = self.keywords + bio_seq.species = self.os(0)[0]['os'] + ' ' + self.os(0)[0]['name'] + bio_seq.classification = self.oc + bio_seq.references = self.references + bio_seq.features = self.ft + + return bio_seq + end ### private methods *************** *** 401,402 **** --- 427,443 ---- end # module Bio + + if __FILE__ == $0 + require '../../../bio' + require 'yaml' + + prefix = 'FT ' + indent = prefix + ' ' * 16 + fwidth = 80 - indent.length + + parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') + parser.each do |entry| + # entry.ref + puts entry.to_biosequence.output(:embl) + end + end \ No newline at end of file From aerts at dev.open-bio.org Fri Feb 15 04:49:39 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Fri, 15 Feb 2008 04:49:39 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.4,0.58.2.5 Message-ID: <200802150449.m1F4ndul026630@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv26608 Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: Added functionality to convert a Bio::EMBL object into a full-blown Bio::Sequence object that contains features, references and other additional information. Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.4 retrieving revision 0.58.2.5 diff -C2 -d -r0.58.2.4 -r0.58.2.5 *** sequence.rb 15 Feb 2008 03:33:51 -0000 0.58.2.4 --- sequence.rb 15 Feb 2008 04:49:37 -0000 0.58.2.5 *************** *** 13,16 **** --- 13,17 ---- # + require 'erb' require 'bio/sequence/compat' *************** *** 73,76 **** --- 74,79 ---- include Format + attr_accessor :sequence_version, :topology, :molecule_type, :data_class, :division, :primary_accession, :secondary_accessions, :date_created, :date_modified, :species, :classification + # Create a new Bio::Sequence object # *************** *** 165,181 **** # --- # *Arguments*: ! # * (required) _style_: :fasta, :genbank, *or* :embl # *Returns*:: String object ! def output(style) ! case style ! when :fasta ! format_fasta ! when :gff ! format_gff ! when :genbank ! format_genbank ! when :embl ! format_embl ! end end --- 168,176 ---- # --- # *Arguments*: ! # * (required) _format_: :fasta, :genbank, *or* :embl # *Returns*:: String object ! def output(format = :fasta) ! record_template = ERB.new(File.read(File.dirname(__FILE__) + "/db/#{format.to_s}/format.erb")) ! record_template.result(binding) end *************** *** 372,375 **** --- 367,375 ---- end + + def accessions + return [@primary_accession, @secondary_accessions].flatten + end + end # Sequence *************** *** 380,510 **** if __FILE__ == $0 ! puts "== Test Bio::Sequence::NA.new" ! p Bio::Sequence::NA.new('') ! p na = Bio::Sequence::NA.new('atgcatgcATGCATGCAAAA') ! p rna = Bio::Sequence::NA.new('augcaugcaugcaugcaaaa') ! ! puts "\n== Test Bio::Sequence::AA.new" ! p Bio::Sequence::AA.new('') ! p aa = Bio::Sequence::AA.new('ACDEFGHIKLMNPQRSTVWYU') ! ! puts "\n== Test Bio::Sequence#to_s" ! p na.to_s ! p aa.to_s ! ! puts "\n== Test Bio::Sequence#subseq(2,6)" ! p na ! p na.subseq(2,6) ! ! puts "\n== Test Bio::Sequence#[2,6]" ! p na ! p na[2,6] ! ! puts "\n== Test Bio::Sequence#to_fasta('hoge', 8)" ! puts na.to_fasta('hoge', 8) ! ! puts "\n== Test Bio::Sequence#window_search(15)" ! p na ! na.window_search(15) {|x| p x} ! ! puts "\n== Test Bio::Sequence#total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4})" ! p na.total({'a'=>0.1,'t'=>0.2,'g'=>0.3,'c'=>0.4}) ! ! puts "\n== Test Bio::Sequence#composition" ! p na ! p na.composition ! p rna ! p rna.composition ! ! puts "\n== Test Bio::Sequence::NA#splicing('complement(join(1..5,16..20))')" ! p na ! p na.splicing("complement(join(1..5,16..20))") ! p rna ! p rna.splicing("complement(join(1..5,16..20))") ! ! puts "\n== Test Bio::Sequence::NA#complement" ! p na.complement ! p rna.complement ! p Bio::Sequence::NA.new('tacgyrkmhdbvswn').complement ! p Bio::Sequence::NA.new('uacgyrkmhdbvswn').complement ! ! puts "\n== Test Bio::Sequence::NA#translate" ! p na ! p na.translate ! p rna ! p rna.translate ! ! puts "\n== Test Bio::Sequence::NA#gc_percent" ! p na.gc_percent ! p rna.gc_percent ! ! puts "\n== Test Bio::Sequence::NA#illegal_bases" ! p na.illegal_bases ! p Bio::Sequence::NA.new('tacgyrkmhdbvswn').illegal_bases ! p Bio::Sequence::NA.new('abcdefghijklmnopqrstuvwxyz-!%#$@').illegal_bases ! ! puts "\n== Test Bio::Sequence::NA#molecular_weight" ! p na ! p na.molecular_weight ! p rna ! p rna.molecular_weight ! ! puts "\n== Test Bio::Sequence::NA#to_re" ! p Bio::Sequence::NA.new('atgcrymkdhvbswn') ! p Bio::Sequence::NA.new('atgcrymkdhvbswn').to_re ! p Bio::Sequence::NA.new('augcrymkdhvbswn') ! p Bio::Sequence::NA.new('augcrymkdhvbswn').to_re ! ! puts "\n== Test Bio::Sequence::NA#names" ! p na.names ! ! puts "\n== Test Bio::Sequence::NA#pikachu" ! p na.pikachu ! ! puts "\n== Test Bio::Sequence::NA#randomize" ! print "Orig : "; p na ! print "Rand : "; p na.randomize ! print "Rand : "; p na.randomize ! print "Rand : "; p na.randomize.randomize ! print "Block : "; na.randomize do |x| print x end; puts ! ! print "Orig : "; p rna ! print "Rand : "; p rna.randomize ! print "Rand : "; p rna.randomize ! print "Rand : "; p rna.randomize.randomize ! print "Block : "; rna.randomize do |x| print x end; puts ! ! puts "\n== Test Bio::Sequence::NA.randomize(counts)" ! print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'t'=>40} ! print "Rand : "; p Bio::Sequence::NA.randomize(counts) ! print "Count : "; p counts = {'a'=>10,'c'=>20,'g'=>30,'u'=>40} ! print "Rand : "; p Bio::Sequence::NA.randomize(counts) ! print "Block : "; Bio::Sequence::NA.randomize(counts) {|x| print x}; puts ! ! puts "\n== Test Bio::Sequence::AA#codes" ! p aa ! p aa.codes ! ! puts "\n== Test Bio::Sequence::AA#names" ! p aa ! p aa.names ! ! puts "\n== Test Bio::Sequence::AA#molecular_weight" ! p aa.subseq(1,20) ! p aa.subseq(1,20).molecular_weight ! ! puts "\n== Test Bio::Sequence::AA#randomize" ! aaseq = 'MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDA' ! s = Bio::Sequence::AA.new(aaseq) ! print "Orig : "; p s ! print "Rand : "; p s.randomize ! print "Rand : "; p s.randomize ! print "Rand : "; p s.randomize.randomize ! print "Block : "; s.randomize {|x| print x}; puts ! puts "\n== Test Bio::Sequence::AA.randomize(counts)" ! print "Count : "; p counts = s.composition ! print "Rand : "; puts Bio::Sequence::AA.randomize(counts) ! print "Block : "; Bio::Sequence::AA.randomize(counts) {|x| print x}; puts end --- 380,404 ---- if __FILE__ == $0 ! require 'bio' ! seq = Bio::Sequence.new('aattaaaacgccacgcaaggcgattctaggaaatcaaaacgacacgaaatgtggggtgggtgtttgggtaggaaagacagttgtcaacatcagggatttggattgaatcaaaaaaaaagtccttagatttcataaaagctaatcacgcctcaaaactggggcctatctcttcttttttgtcgcttcctgtcggtccttctctatttcttctccaacccctcatttttgaatatttacataacaaaccgttttactttctttggtcaaaattagacccaaaattctatattagtttaagatatgtggtctgtaatttattgttgtattgatataaaaattagttataagcgattatatttttatgctcaagtaactggtgttagttaactatattccaccacgataacctgattacataaaatatgattttaatcattttagtaaaccatatcgcacgttggatgattaattttaacggtttaataacacgtgattaaattatttttagaatgattatttacaaacggaaaagctatatgtgacacaataactcgtgcagtattgttagtttgaaaagtgtatttggtttcttatatttggcctcgattttcagtttatgtgctttttacaaagttttattttcgttatctgtttaacgcgacatttgttgtatggctttaccgatttgagaataaaatcatattacctttatgtagccatgtgtggtgtaatatataataatggtccttctacgaaaaaagcagatcacaattgaaataaagggtgaaatttggtgtcccttttcttcgtcgaaataacagaactaaataaaagaaagtgttatagtatattacgtccgaagaataatccatattcctgaaatacagtcaacatattatatatttagtactttatataaagttaggaattaaatcatatgttttatcgaccatattaagt! cacaactttatcataaattaatctgtaattagaattccaagttcgccaccgaatttcgtaacctaatctacatataatagataaaatatatatatgtagagtaattatgatatctatgtatgtagtcatggtatatgaattttgaaattggcaaggtaacattgacggatcgtaacccaacaaataatattaattacaaaatgggtgggcgggaatagtatacaactcataattccactcactttttgtattattaggatatgaaataagagtaatcaacatgcataataaagatgtataatttcttcatcttaaaaaacataactacatggtttaatacacaattttaccttttatcaaaaaagtatttcacaattcactcgcaaattacgaaatgatggctagtgcttcaactccaaatttcgaatattttaaatcacgatgtgtagaaccttttatttactggatactaatcactagtttattgagccaaccaattagttaaatagaacaatcaatattatagccagatattttttcctttaaaaatatttaaaagaggggccagaaaagaaccagagagggaggccatgagacattattatcactagtcaaaaacaacaaaccctccttttgctttttcatataaattattatattttattttgcaggtttcttctcttcttcttcttcttcttcttcttcttcctcttggctgctttctttcatcatccataaagtgaaagctaacgcatagagagagccatatcgtcccaaaaaaagcaaaagtccaaaaaaaaacaactccaaaacattctctcttagctctttactctttagtttctctctctctctctgcctttctctttgttgaagttcatggatgctacgaagtggactcaggtacgtaaaaagatatctctctgctatatctgtttgtttgtagcttctccccgactctcacgctctctctctctctctctctctc! tttgtgtatctctctactcacataaatatatacatgtgtgtgtatgcatgtttatatgtatgtatgaaac cagtagtggttatacagatagtctatatagagatatcaatatgatgtgttttaatttagactttttatatatccgtttgaaacttccgaagttctcgaatggagttaaggaagttttgttctctacaagttcaatttttcttgtcattaattataaaactctgataactaatggataaaaaaggtatgctttgttagttaccttttgttcttggtgctcaggtcttaccatttttttcctaaattttaattagtctcctttctttaattaattttatgttaacgcactgacgatttaacgttaacaaaaaaacctagattctttttcttttcaatagagcataattattacttcaatttcatttatctcacactaaaccctaatcttggcgaaattccttttatatatataaatttaattaatttttccacaatcttggcggaattcaggactcggttttgcttgttattgttctctcttttaatttgacatggttagggaatacttaaagtatgtcttaattttatagggttttcaagaaatgataaacgtaaagccaatggagcaaatgatttctagcaccaacaacaacacaccgcaacaacaaccaacattcatcgccaccaacacaaggccaaacgccaccgcatccaatggtggctccggaggaaataccaacaacacggctacgatggaaactagaaaggcgaggccacaagagaaagtaaattgtccaagatgcaactcaacaaacacaaagttctgttattacaacaactacagtctcacgcaaccaagatacttctgcaaaggttgtcgaaggtattggaccgaaggtggctctcttcgtaacgtcccagtcggaggtagctcaagaaagaacaagagatcctctacacctttagcttcaccttctaatcccaaacttccagatctaaacccaccgattcttttctcaagccaaatccctaataagtcaaataaagatc! tcaacttgctatctttcccggtcatgcaagatcatcatcatcatggtatgtctcatttttttcatatgcccaagatagagaacaacaatacttcatcctcaatctatgcttcatcatctcctgtctcagctcttgagcttctaagatccaatggagtctcttcaagaggcatgaacacgttcttgcctggtcaaatgatggattcaaactcagtcctgtactcatctttagggtttccaacaatgcctgattacaaacagagtaataacaacctttcattctccattgatcatcatcaagggattggacataacaccatcaacagtaaccaaagagctcaagataacaatgatgacatgaatggagcaagtagggttttgttccctttttcagacatgaaagagctttcaagcacaacccaagagaagagtcatggtaataatacatattggaatgggatgttcagtaatacaggaggatcttcatggtgaaaaaaggttaaaaagagctcatgaactatcagctttcttctctttttctgtttttttctcctattttattatagtttttactttgatgatcttttgttttttctcacatggggaactttacttaaagttgtcagaacttagtttacagattgtctttttattccttctttctggttttccttttttcctttttttatcagtctttttaaaatatgtatttcataattgggtttgatcattcatatttattagtatcaaaatagagtctatgttcatgagggagtgttaaggggtgtgagggtagaagaataagtgaatacgggggcccg') ! seq.entry_id = 'AJ224122' ! seq.sequence_version = 3 ! seq.topology = 'linear' ! seq.molecule_type = 'genomic DNA' ! seq.data_class = 'STD' ! seq.division = 'PLN' ! seq.primary_accession = 'AJ224122' ! seq.secondary_accessions = [] ! seq.date_created = '27-FEB-1998 (Rel. 54, Created)' ! seq.date_modified = '14-NOV-2006 (Rel. 89, Last updated, Version 6)' ! seq.definition = 'Arabidopsis thaliana DAG1 gene' ! seq.keywords = ['BBFa gene', 'transcription factor'] ! seq.species = 'Arabidopsis thaliana (thale cress)' ! seq.classification = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', ! 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core eudicotyledons', 'rosids', ! 'eurosids II', 'Brassicales', 'Brassicaceae', 'Arabidopsis'] ! # puts seq.output(:embl) ! puts seq.output(:fasta) end From ngoto at dev.open-bio.org Fri Feb 15 05:29:52 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 15 Feb 2008 05:29:52 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.5,0.58.2.6 Message-ID: <200802150529.m1F5Tqn1026874@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv26854/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: bugfix in Bio::Sequence.read: mistaken method name Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.5 retrieving revision 0.58.2.6 diff -C2 -d -r0.58.2.5 -r0.58.2.6 *** sequence.rb 15 Feb 2008 04:49:37 -0000 0.58.2.5 --- sequence.rb 15 Feb 2008 05:29:50 -0000 0.58.2.6 *************** *** 361,365 **** klass = format else ! klass = Bio::FlatFile::AutoDetect.default.guess(str) end obj = klass.new(str) --- 361,365 ---- klass = format else ! klass = Bio::FlatFile::AutoDetect.default.autodetect(str) end obj = klass.new(str) From aerts at dev.open-bio.org Mon Feb 18 15:43:29 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Mon, 18 Feb 2008 15:43:29 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/fasta - New directory Message-ID: <200802181543.m1IFhTLc011233@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/fasta In directory dev.open-bio.org:/tmp/cvs-serv11213/fasta Log Message: Directory /home/repository/bioruby/bioruby/lib/bio/db/fasta added to the repository --> Using per-directory sticky tag `BRANCH-biohackathon2008' From aerts at dev.open-bio.org Mon Feb 18 15:44:41 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Mon, 18 Feb 2008 15:44:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24,1.24.2.1 Message-ID: <200802181544.m1IFifJc011281@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv11261 Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: Added export method to EMBL format. Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24 retrieving revision 1.24.2.1 diff -C2 -d -r1.24 -r1.24.2.1 *** reference.rb 5 Apr 2007 23:35:39 -0000 1.24 --- reference.rb 18 Feb 2008 15:44:39 -0000 1.24.2.1 *************** *** 2,8 **** # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001, 2006 # Toshiaki Katayama , ! # Ryan Raaum # License:: The Ruby License # --- 2,9 ---- # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001, 2006, 2008 # Toshiaki Katayama , ! # Ryan Raaum , ! # Jan Aerts # License:: The Ruby License # *************** *** 79,82 **** --- 80,89 ---- # Affiliations in an Array. attr_reader :affiliations + + # Sequence number in EMBL/GenBank records + attr_reader :embl_gb_record_number + + # Position in a sequence that this reference refers to + attr_reader :sequence_position # Create a new Bio::Reference object from a Hash of values. *************** *** 130,133 **** --- 137,144 ---- @url = hash['url'] @mesh = hash['mesh'] + @embl_gb_record_number = hash['embl_gb_record_number'] || nil + @sequence_position = hash['sequence_position'] || [] + @comments = hash['comments'] || [] + @xrefs = hash['xrefs'] || [] @affiliations = hash['affiliations'] @authors = [] if @authors.empty? *************** *** 171,174 **** --- 182,187 ---- def format(style = nil, option = nil) case style + when 'embl' + return embl when 'endnote' return endnote *************** *** 246,249 **** --- 259,298 ---- end + # Returns reference formatted in the EMBL style. + # + # # ref is a Bio::Reference object + # puts ref.embl + # + # RP 1-1859 + # RX PUBMED; 1907511. + # RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; + # RT "Nucleotide and derived amino acid sequence of the cyanogenic + # RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)"; + # RL Plant Mol. Biol. 17(2):209-219(1991). + def embl + lines = Array.new + if ! @embl_gb_record_number.nil? + lines << "RN [#{@embl_gb_record_number}]" + end + if @comments != [] + @comments.each do |c| + lines << "RC #{c}" + end + end + if @sequence_position != '' + lines << "RP #{@sequence_position}" + end + if ! @xrefs.nil? + @xrefs.each do |x| + lines << "RX #{x}" + end + end + lines << @authors.join(', ').wrap(80, 'RA ') + ';' unless @authors.nil? + lines << (@title == '' ? 'RT ;' : ('"' + @title + '"').wrap(80, 'RT ') + ';') + lines << @journal.wrap(80, 'RL ') unless @journal == '' + lines << "XX" + return lines.join("\n") + end + # Returns reference formatted in the bibitem style # *************** *** 542,546 **** # class References ! # Array of Bio::Reference objects attr_accessor :references --- 591,596 ---- # class References ! include Enumerable ! # Array of Bio::Reference objects attr_accessor :references From k at dev.open-bio.org Tue Feb 19 03:36:54 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 03:36:54 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io ncbirest.rb, NONE, 1.1 pubmed.rb, 1.23, 1.24 Message-ID: <200802190336.m1J3as4O012327@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12321 Modified Files: pubmed.rb Added Files: ncbirest.rb Log Message: * NCBI E-Utilities (REST) functionality is separated to ncbirest.rb and pubmed.rb is changed to utilize the Bio::NCBI::REST class for esearch and efetch. You can now search and retrieve any database in any format that NCBI supports by E-Utilities through the Bio::NCBI::REST interface (currently, only esearch and efetch methods are implemented). Index: pubmed.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/pubmed.rb,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** pubmed.rb 12 Dec 2007 13:53:26 -0000 1.23 --- pubmed.rb 19 Feb 2008 03:36:52 -0000 1.24 *************** *** 2,6 **** # = bio/io/pubmed.rb - NCBI Entrez/PubMed client module # ! # Copyright:: Copyright (C) 2001, 2007 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License --- 2,6 ---- # = bio/io/pubmed.rb - NCBI Entrez/PubMed client module # ! # Copyright:: Copyright (C) 2001, 2007, 2008 Toshiaki Katayama # Copyright:: Copyright (C) 2006 Jan Aerts # License:: The Ruby License *************** *** 9,12 **** --- 9,13 ---- # + require 'bio/io/ncbirest' require 'bio/command' require 'cgi' unless defined?(CGI) *************** *** 69,95 **** # medline = Bio::MEDLINE.new(manuscript) # ! class PubMed ! ! # Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time ! # weekdays for any series of more than 100 requests. ! # -> Not implemented yet in BioRuby ! ! # Make no more than one request every 3 seconds. ! NCBI_INTERVAL = 3 ! @@last_access = nil ! ! private ! ! def ncbi_access_wait(wait = NCBI_INTERVAL) ! if @@last_access ! duration = Time.now - @@last_access ! if wait > duration ! sleep wait - duration ! end ! end ! @@last_access = Time.now ! end ! ! public # Search the PubMed database by given keywords using E-Utils and returns --- 70,74 ---- # medline = Bio::MEDLINE.new(manuscript) # ! class PubMed < Bio::NCBI::REST # Search the PubMed database by given keywords using E-Utils and returns *************** *** 100,136 **** # --- # *Arguments*: ! # * _id_: query string (required) ! # * _field_ ! # * _reldate_ ! # * _mindate_ ! # * _maxdate_ ! # * _datetype_ ! # * _retstart_ ! # * _retmax_ (default 100) ! # * _retmode_ ! # * _rettype_ # *Returns*:: array of PubMed IDs or a number of results def esearch(str, hash = {}) ! return nil if str.empty? ! ! serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" ! opts = { ! "retmax" => 100, ! "tool" => "bioruby", ! "db" => "pubmed", ! "term" => str ! } opts.update(hash) ! ! ncbi_access_wait ! ! response, = Bio::Command.post_form(serv, opts) ! result = response.body ! if opts['rettype'] == 'count' ! result = result.scan(/(.*?)<\/Count>/m).flatten.first.to_i ! else ! result = result.scan(/(.*?)<\/Id>/m).flatten ! end ! return result end --- 79,98 ---- # --- # *Arguments*: ! # * _str_: query string (required) ! # * _hash_: hash of E-Utils options ! # * _retmode_: "xml", "html", ... ! # * _rettype_: "medline", ... ! # * _retmax_: integer (default 100) ! # * _retstart_: integer ! # * _field_ ! # * _reldate_ ! # * _mindate_ ! # * _maxdate_ ! # * _datetype_ # *Returns*:: array of PubMed IDs or a number of results def esearch(str, hash = {}) ! opts = { "db" => "pubmed" } opts.update(hash) ! super(str, opts) end *************** *** 142,168 **** # *Arguments*: # * _ids_: list of PubMed IDs (required) # *Returns*:: Array of MEDLINE formatted String def efetch(ids, hash = {}) ! return nil if ids.to_s.empty? ! ids = ids.join(",") if ids === Array ! ! serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" ! opts = { ! "tool" => "bioruby", ! "db" => "pubmed", ! "retmode" => "text", ! "rettype" => "medline", ! "id" => ids, ! } opts.update(hash) ! ! ncbi_access_wait ! ! response, = Bio::Command.post_form(serv, opts) ! result = response.body ! if opts["retmode"] == "text" ! result = result.split(/\n\n+/) ! end ! return result end --- 104,122 ---- # *Arguments*: # * _ids_: list of PubMed IDs (required) + # * _hash_: hash of E-Utils options + # * _retmode_: "xml", "html", ... + # * _rettype_: "medline", ... + # * _retmax_: integer (default 100) + # * _retstart_: integer + # * _field_ + # * _reldate_ + # * _mindate_ + # * _maxdate_ + # * _datetype_ # *Returns*:: Array of MEDLINE formatted String def efetch(ids, hash = {}) ! opts = { "db" => "pubmed", "rettype" => "medline" } opts.update(hash) ! super(ids, opts) end --- NEW FILE: ncbirest.rb --- # # = bio/io/ncbrest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # # $Id: ncbirest.rb,v 1.1 2008/02/19 03:36:52 k Exp $ # require 'bio/command' module Bio # == Description # # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # # * Entrez utilities index: # http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html # * How to link: # http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp # # == Usage # # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) # Bio::NCBI::REST.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml"}) # class NCBI class REST # Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time # weekdays for any series of more than 100 requests. # -> Not implemented yet in BioRuby # Make no more than one request every 3 seconds. NCBI_INTERVAL = 3 @@last_access = nil private def ncbi_access_wait(wait = NCBI_INTERVAL) if @@last_access duration = Time.now - @@last_access if wait > duration sleep wait - duration end end @@last_access = Time.now end public # Search the NCBI database by given keywords using E-Utils and returns # an array of entry IDs. # # For information on the possible arguments, see # # * http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html # # --- # *Arguments*: # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "nuccore", "pubmed", ... # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) # * _retstart_: integer # * _field_ # * _reldate_ # * _mindate_ # * _maxdate_ # * _datetype_ # *Returns*:: array of entry IDs or a number of results def esearch(str, hash = {}) return nil if str.empty? serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" opts = { "retmax" => 100, "tool" => "bioruby", "term" => str } opts.update(hash) ncbi_access_wait response, = Bio::Command.post_form(serv, opts) result = response.body if opts['rettype'] == 'count' result = result.scan(/(.*?)<\/Count>/m).flatten.first.to_i else result = result.scan(/(.*?)<\/Id>/m).flatten end return result end # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} # * _db_: "nuccore", "pubmed", ... # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) # * _retstart_: integer # * _field_ # * _reldate_ # * _mindate_ # * _maxdate_ # * _datetype_ # *Returns*:: Array of entry String def efetch(ids, hash = {}) return nil if ids.to_s.empty? ids = ids.join(",") if ids === Array serv = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi" opts = { "tool" => "bioruby", "retmode" => "text", "id" => ids, } opts.update(hash) ncbi_access_wait response, = Bio::Command.post_form(serv, opts) result = response.body if opts["retmode"] == "text" result = result.split(/\n\n+/) end return result end def self.esearch(*args) self.new.esearch(*args) end def self.efetch(*args) self.new.efetch(*args) end end # REST end # NCBI end # Bio if __FILE__ == $0 gbopts = {"db"=>"nuccore", "rettype"=>"gb"} pmopts = {"db"=>"pubmed", "rettype"=>"medline"} count = {"rettype" => "count"} xml = {"retmode"=>"xml"} max = {"retmax"=>5} puts "=== class methods ===" puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'tardigrada' in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(count)) puts Time.now puts "# max 5 'tardigrada' entries in nuccore" puts Bio::NCBI::REST.esearch("tardigrada", gbopts.merge(max)) puts Time.now puts "# count of 'yeast kinase' in nuccore" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(count)) puts Time.now puts "# max 5 'yeast kinase' entries in nuccore (XML)" puts Bio::NCBI::REST.esearch("yeast kinase", gbopts.merge(xml).merge(max)) puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed (XML)" puts Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(xml).merge(max)) puts Time.now Bio::NCBI::REST.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)).each do |x| puts "# each of 5 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# '185041' entry in nuccore" puts Bio::NCBI::REST.efetch("185041", gbopts) puts Time.now puts "# 'J00231' entry in nuccore (XML)" puts Bio::NCBI::REST.efetch("J00231", gbopts.merge(xml)) puts Time.now puts "# 16381885 entry in pubmed" puts Bio::NCBI::REST.efetch(16381885, pmopts) puts Time.now puts "# '16381885' entry in pubmed" puts Bio::NCBI::REST.efetch("16381885", pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed (XML)" puts Bio::NCBI::REST.efetch([10592173, 14693808], pmopts.merge(xml)) puts "=== instance methods ===" ncbi = Bio::NCBI::REST.new puts "--- Search NCBI by E-Utils ---" puts Time.now puts "# count of 'genome&analysis|bioinformatics' in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(count)) puts Time.now puts "# max 5 'genome&analysis|bioinformatics' entries in pubmed" puts ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts.merge(max)) puts Time.now ncbi.esearch("(genome AND analysis) OR bioinformatics", pmopts).each do |x| puts "# each 'genome&analysis|bioinformatics' entries in pubmed" puts x end puts "--- Retrieve NCBI entry by E-Utils ---" puts Time.now puts "# 16381885 entry in pubmed" puts ncbi.efetch(16381885, pmopts) puts Time.now puts "# [10592173,14693808] entries in pubmed" puts ncbi.efetch([10592173, 14693808], pmopts) end From k at dev.open-bio.org Tue Feb 19 04:42:16 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 04:42:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io hinv.rb,1.1,1.2 Message-ID: <200802190442.m1J4gGQZ012425@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12421 Modified Files: hinv.rb Log Message: * hit2acc fixed Index: hinv.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/hinv.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** hinv.rb 9 Jan 2008 17:18:18 -0000 1.1 --- hinv.rb 19 Feb 2008 04:42:14 -0000 1.2 *************** *** 2,6 **** # = bio/io/hinv.rb - H-invDB web service (REST) client module # ! # Copyright:: Copyright (C) 2007 Toshiaki Katayama # License:: The Ruby License # --- 2,6 ---- # = bio/io/hinv.rb - H-invDB web service (REST) client module # ! # Copyright:: Copyright (C) 2008 Toshiaki Katayama # License:: The Ruby License # *************** *** 137,141 **** def initialize ! @url = BASE_URI + "hit2acc.php?hit=" end --- 137,141 ---- def initialize ! @url = BASE_URI + "hit2acc.php" end From k at dev.open-bio.org Tue Feb 19 04:49:37 2008 From: k at dev.open-bio.org (Katayama Toshiaki) Date: Tue, 19 Feb 2008 04:49:37 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io ncbirest.rb,1.1,1.2 Message-ID: <200802190449.m1J4nb9x012447@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory dev.open-bio.org:/tmp/cvs-serv12443 Modified Files: ncbirest.rb Log Message: * doc update Index: ncbirest.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/ncbirest.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** ncbirest.rb 19 Feb 2008 03:36:52 -0000 1.1 --- ncbirest.rb 19 Feb 2008 04:49:35 -0000 1.2 *************** *** 1,4 **** # ! # = bio/io/ncbrest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama --- 1,4 ---- # ! # = bio/io/ncbirest.rb - NCBI Entrez client module # # Copyright:: Copyright (C) 2008 Toshiaki Katayama *************** *** 16,26 **** # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # ! # * Entrez utilities index: ! # http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html ! # * How to link: ! # http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinks.chapter.linkshelp # # == Usage # # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) --- 16,26 ---- # The Bio::NCBI::REST class provides REST client for the NCBI E-Utilities # ! # Entrez utilities index: ! # ! # * http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html # # == Usage # + # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"count"}) # Bio::NCBI::REST.esearch("tardigrada", {"db"=>"nuccore", "rettype"=>"gb"}) # Bio::NCBI::REST.esearch("yeast kinase", {"db"=>"nuccore", "rettype"=>"gb", "retmode"=>"xml", "retmax"=>5}) *************** *** 64,69 **** # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "pubmed", ... ! # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) --- 64,69 ---- # * _str_: query string (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "nucleotide", "protein", "pubmed", ... ! # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "medline", "count", ... # * _retmax_: integer (default 100) *************** *** 100,109 **** # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "pubmed", ... ! # * _retmode_: "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) --- 100,114 ---- # Retrieve a database entry by given ID and using E-Utils (efetch) and # returns an array of entry string. Multiple IDs can be supplied. + # + # For information on the possible arguments, see + # + # * http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html + # # --- # *Arguments*: # * _ids_: list of NCBI entry IDs (required) # * _hash_: hash of E-Utils option {"db" => "nuccore", "rettype" => "gb"} ! # * _db_: "nuccore", "nucleotide", "protein", "pubmed", ... ! # * _retmode_: "text", "xml", "html", ... # * _rettype_: "gb", "medline", "count",... # * _retmax_: integer (default 100) From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby ChangeLog,1.83,1.83.2.1 Message-ID: <200802200956.m1K9uOcm015785@dev.open-bio.org> Update of /home/repository/bioruby/bioruby In directory dev.open-bio.org:/tmp/cvs-serv15755 Modified Files: Tag: BRANCH-biohackathon2008 ChangeLog Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.83 retrieving revision 1.83.2.1 diff -C2 -d -r1.83 -r1.83.2.1 *** ChangeLog 12 Feb 2008 05:32:23 -0000 1.83 --- ChangeLog 20 Feb 2008 09:56:21 -0000 1.83.2.1 *************** *** 1,2 **** --- 1,27 ---- + 2008-02-20 Jan Aerts + * lib/bio/db/fasta.rb + * lib/bio/db/fasta/format.erb + * test/unit/bio/db/test_fasta.rb + + Renamed #to_seq to #to_biosequence to reflect that same method in + embl.rb, genbank.rb and others. + + 2008-02-20 Jan Aerts + * lib/bio.rb + * lib/bio/db/embl/common.rb + * lib/bio/db/embl/embl.rb + * lib/bio/db/embl/format.erb + * lib/bio/sequence/common.rb + * lib/bio/sequence/format.rb + * test/unit/bio/db/embl/test_embl_to_bioseq.rb + + Fixed some bugs in importing EMBL files and added functionality to + export a Bio::Sequence to EMBL format. + + 2008-02-18 Jan Aerts + * lib/bio/reference.rb + + Added export method to EMBL format. + 2008-02-12 Naohisa Goto From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl format.erb, NONE, 1.1.2.1 common.rb, 1.12, 1.12.2.1 embl.rb, 1.29.2.1, 1.29.2.2 Message-ID: <200802200956.m1K9uO6r015800@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio/db/embl Modified Files: Tag: BRANCH-biohackathon2008 common.rb embl.rb Added Files: Tag: BRANCH-biohackathon2008 format.erb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl --- NEW FILE: format.erb --- ID <%= entry_id %>; SV <%= sequence_version %>; <%= topology %>; <%= molecule_type %>; <%= data_class %>; <%= division %>; <%= seq.length %> BP. XX AC <%= accessions.reject{|a| a.nil?}.join('; ') + ';' %> XX DT <%= date_created %> DT <%= date_modified %> XX DE <%= definition %> XX KW <%= keywords.join('; ') %>. XX OS <%= species %> <%= classification.join('; ').wrap(80, 'OC ') %>. XX <%= references.collect{|ref| ref.format('embl')}.join("\n") %> XX FH Key Location/Qualifiers FH <% prefix = 'FT ' indent = prefix + ' ' * 16 fwidth = 80 - indent.length %><%= format_features(prefix, indent, fwidth) %>XX SQ Sequence <%= seq.length %> BP; <%= seq.composition.collect{|k,v| "#{v} #{k.upcase}"}.join('; ') + '; ' + (seq.gsub(/[ACTGactg]/, '').length.to_s ) + ' other;' %> <%= seq.format_embl %> // Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.29.2.1 retrieving revision 1.29.2.2 diff -C2 -d -r1.29.2.1 -r1.29.2.2 *** embl.rb 15 Feb 2008 04:49:37 -0000 1.29.2.1 --- embl.rb 20 Feb 2008 09:56:22 -0000 1.29.2.2 *************** *** 123,126 **** --- 123,130 ---- alias molecule_type molecule + def data_class + id_line('DATA_CLASS') + end + def topology id_line('TOPOLOGY') *************** *** 254,258 **** unless @data['FT'] @data['FT'] = Array.new - ary = Array.new in_quote = false @orig['FT'].each_line do |line| --- 258,261 ---- *************** *** 262,268 **** body = line[20,60].chomp # feature value (position, /qualifier=) if line =~ /^FT {3}(\S+)/ ! ary.push([ $1, body ]) # [ feature, position, /q="data", ... ] elsif body =~ /^ \// and not in_quote ! ary.last.push(body) # /q="data..., /q=data, /q if body =~ /=" / and body !~ /"$/ --- 265,271 ---- body = line[20,60].chomp # feature value (position, /qualifier=) if line =~ /^FT {3}(\S+)/ ! @data['FT'].push([ $1, body ]) # [ feature, position, /q="data", ... ] elsif body =~ /^ \// and not in_quote ! @data['FT'].last.push(body) # /q="data..., /q=data, /q if body =~ /=" / and body !~ /"$/ *************** *** 271,275 **** else ! ary.last.last << body # ...data..., ...data..." if body =~ /"$/ --- 274,278 ---- else ! @data['FT'].last.last << body # ...data..., ...data..." if body =~ /"$/ *************** *** 279,287 **** end ! ary.map! do |subary| parse_qualifiers(subary) end - @data['FT'] = Features.new(ary) end if block_given? --- 282,289 ---- end ! @data['FT'].map! do |subary| parse_qualifiers(subary) end end if block_given? *************** *** 373,378 **** bio_seq.entry_id = self.entry_id bio_seq.primary_accession = self.accessions[0] ! bio_seq.secondary_accessions = self.accessions[1,-1] bio_seq.molecule_type = self.molecule_type bio_seq.definition = self.description bio_seq.topology = self.topology --- 375,381 ---- bio_seq.entry_id = self.entry_id bio_seq.primary_accession = self.accessions[0] ! bio_seq.secondary_accessions = self.accessions[1,-1] || [] bio_seq.molecule_type = self.molecule_type + bio_seq.data_class = self.data_class bio_seq.definition = self.description bio_seq.topology = self.topology *************** *** 382,386 **** bio_seq.sequence_version = self.version bio_seq.keywords = self.keywords ! bio_seq.species = self.os(0)[0]['os'] + ' ' + self.os(0)[0]['name'] bio_seq.classification = self.oc bio_seq.references = self.references --- 385,389 ---- bio_seq.sequence_version = self.version bio_seq.keywords = self.keywords ! bio_seq.species = self.fetch('OS') bio_seq.classification = self.oc bio_seq.references = self.references *************** *** 435,439 **** indent = prefix + ' ' * 16 fwidth = 80 - indent.length ! parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') parser.each do |entry| --- 438,443 ---- indent = prefix + ' ' * 16 fwidth = 80 - indent.length ! ! # parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/bioruby_biohackathon/bioruby/test/data/embl/AB090716.embl') parser = Bio::FlatFile.auto('/home/aertsj/LocalDocuments/hackathon/aj224122.embl') parser.each do |entry| Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/common.rb,v retrieving revision 1.12 retrieving revision 1.12.2.1 diff -C2 -d -r1.12 -r1.12.2.1 *** common.rb 5 Apr 2007 23:35:40 -0000 1.12 --- common.rb 20 Feb 2008 09:56:22 -0000 1.12.2.1 *************** *** 241,265 **** def ref unless @data['R'] ! ary = Array.new ! get('R').split(/\nRN /).each do |str| ! raw = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', ! 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} ! str = 'RN ' + str unless /^RN / =~ str ! str.split("\n").each do |line| ! if /^(R[NPXARLCTG]) (.+)/ =~ line ! raw[$1] += $2 + ' ' ! else ! raise "Invalid format in R lines, \n[#{line}]\n" end end ! raw.each_value {|v| ! v.strip! ! v.sub!(/^"/,'') ! v.sub!(/;$/,'') ! v.sub!(/"$/,'') ! } ! ary.push(raw) end - @data['R'] = ary end @data['R'] --- 241,305 ---- def ref unless @data['R'] ! @data['R'] = Array.new ! # Get the different references as 'blurbs' (the lines together) ! reference_blurbs = get('R').split(/\nRN /) ! reference_blurbs.each_index do |i| ! reference_blurbs[i] = 'RN ' + reference_blurbs[i] unless reference_blurbs[i] =~ /^RN / ! end ! ! # For each reference, we'll first create a hash that looks like below. ! # Suppose the input is: ! # RA name1, name2, name3 ! # RA name4 ! # RT some part of the title that ! # RT did not fit on one line ! # Then the hash looks like: ! # h = { ! # 'RA' => ["name1, name2, name3", "name4"], ! # 'RT' => ["some part of the title that", "did not fit on one line"] ! # } ! reference_blurbs.each do |rb| ! line_based_data = Hash.new ! rb.split(/\n/).each do |line| ! key, value = line.scan(/^(R[A-Z]) "?(\[?.*[A-Za-z0-9]\]?)/)[0] ! if line_based_data[key].nil? ! line_based_data[key] = Array.new end + line_based_data[key].push(value) end ! ! # Now we have to sanitize the hash: the authors should be kept in an ! # array, the title should be 1 string, ... So the hash should look like: ! # h = { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # } ! line_based_data.keys.each do |key| ! if ['RC', 'RP', 'RT', 'RL'].include?(key) ! line_based_data[key] = line_based_data[key].join(' ') ! elsif ['RA', 'RX'].include?(key) ! sanitized_data = Array.new ! line_based_data[key].each do |v| ! sanitized_data.push(v.split(/\s*,\s*/)) ! end ! line_based_data[key] = sanitized_data.flatten ! elsif key == 'RN' ! line_based_data[key] = line_based_data[key][0].sub(/^\[/,'').sub(/\]$/,'').to_i ! end ! end ! ! # And put it in @data. @data in the end looks like this: ! # data = [ ! # { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # }, ! # { ! # 'RA' => ["name1", "name2", "name3", "name4"], ! # 'RT' => 'some part of the title that did not fit on one line' ! # } ! # ] ! @data['R'].push(line_based_data) end end @data['R'] *************** *** 270,306 **** def references unless @data['references'] ! ary = self.ref.map {|ent| ! hash = Hash.new('') ! ent.each {|key, value| case key when 'RA' ! hash['authors'] = value.split(/, /) when 'RT' hash['title'] = value when 'RL' ! if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ ! hash['journal'] = $1 ! hash['volume'] = $2 ! hash['issue'] = $3 ! hash['pages'] = $4 ! hash['year'] = $5 ! else ! hash['journal'] = value ! end when 'RX' # PUBMED, MEDLINE ! value.split('.').each {|item| tag, xref = item.split(/; /).map {|i| i.strip } hash[ tag.downcase ] = xref } end ! } ! Reference.new(hash) ! } ! @data['references'] = References.new(ary) end @data['references'] end - # returns contents in the DR line. # * Bio::EMBLDB::Common#dr -> [ * ] --- 310,345 ---- def references unless @data['references'] ! @data['references'] = Array.new ! self.ref.each do |ref| ! hash = Hash.new ! ref.each do |key, value| case key + when 'RN' + hash['embl_gb_record_number'] = value + when 'RC' + hash['comments'] = value + when 'RX' + hash['xrefs'] = value + when 'RP' + hash['sequence_position'] = value when 'RA' ! hash['authors'] = value when 'RT' hash['title'] = value when 'RL' ! hash['journal'] = value when 'RX' # PUBMED, MEDLINE ! value.each {|item| tag, xref = item.split(/; /).map {|i| i.strip } hash[ tag.downcase ] = xref } end ! end ! @data['references'].push(Reference.new(hash)) ! end end @data['references'] end # returns contents in the DR line. # * Bio::EMBLDB::Common#dr -> [ * ] From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.58.2.6,0.58.2.7 Message-ID: <200802200956.m1K9uO8C015795@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 sequence.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.58.2.6 retrieving revision 0.58.2.7 diff -C2 -d -r0.58.2.6 -r0.58.2.7 *** sequence.rb 15 Feb 2008 05:29:50 -0000 0.58.2.6 --- sequence.rb 20 Feb 2008 09:56:22 -0000 0.58.2.7 *************** *** 371,375 **** return [@primary_accession, @secondary_accessions].flatten end ! end # Sequence --- 371,375 ---- return [@primary_accession, @secondary_accessions].flatten end ! end # Sequence From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89,1.89.2.1 Message-ID: <200802200956.m1K9uOdN015790@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv15755/lib Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89 retrieving revision 1.89.2.1 diff -C2 -d -r1.89 -r1.89.2.1 *** bio.rb 9 Jan 2008 17:18:17 -0000 1.89 --- bio.rb 20 Feb 2008 09:56:22 -0000 1.89.2.1 *************** *** 278,279 **** --- 278,310 ---- end + class String + def fold(width = 80) + self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") + end + + def wrap(width = 80, prefix = '') + actual_width = width - prefix.length + result = [] + left = self.dup + while left and left.length > actual_width + line = nil + actual_width.downto(1) do |i| + if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then + line = left[0..(i-1)].sub(/ +\z/, '') + left = left[i..-1].sub(/\A +/, '') + break + end + end + if line.nil? then + line = left[0..(actual_width-1)] + left = left[actual_width..-1] + end + result << line + end + result << left if left + result_string = result.join("\n#{prefix}") + result_string = prefix + result_string unless result_string.empty? + # result_string << "\n" unless result_string.empty? + return result_string + end + end \ No newline at end of file From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence common.rb, 1.6, 1.6.2.1 format.rb, 1.4.2.3, 1.4.2.4 Message-ID: <200802200956.m1K9uOhl015806@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv15755/lib/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 common.rb format.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.3 retrieving revision 1.4.2.4 diff -C2 -d -r1.4.2.3 -r1.4.2.4 *** format.rb 15 Feb 2008 02:18:21 -0000 1.4.2.3 --- format.rb 20 Feb 2008 09:56:22 -0000 1.4.2.4 *************** *** 31,106 **** # puts s.output(:embl) module Format - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the FASTA format string of the sequence. - # - # UNFORTUNATLY, the current implementation of Bio::Sequence is incapable of - # using either the header or width arguments. So something needs to be - # changed... - # - # Currently, this method is used in Bio::Sequence#output like so, - # - # s = Bio::Sequence.new('atgc') - # puts s.output(:fasta) #=> "> \natgc\n" - # --- - # *Arguments*: - # * (optional) _header_: String (default nil) - # * (optional) _width_: Fixnum (default nil) - # *Returns*:: String object - def format_fasta(header = nil, width = nil) - header ||= "#{@entry_id} #{@definition}" - - ">#{header}\n" + - if width - @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") - else - @seq.to_s + "\n" - end - end - - # Not yet implemented :) - # Remove the nodoc command after implementation! - # --- - # *Returns*:: String object - def format_gff #:nodoc: - raise NotImplementedError - end - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the Genbank format string of the sequence. - # Used in Bio::Sequence#output. - # --- - # *Returns*:: String object - def format_genbank - prefix = ' ' * 5 - indent = prefix + ' ' * 16 - fwidth = 79 - indent.length - - format_features(prefix, indent, fwidth) - end - - # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any - # case, it would be difficult to successfully call this method outside - # its expected context). - # - # Output the EMBL format string of the sequence. - # Used in Bio::Sequence#output. - # --- - # *Returns*:: String object - def format_embl - prefix = 'FT ' - indent = prefix + ' ' * 16 - fwidth = 80 - indent.length - - format_features(prefix, indent, fwidth) - end - - private --- 31,34 ---- *************** *** 114,123 **** head = '' ! wrap(position, width).each_line do |line| result << head << line head = indent end ! result << format_qualifiers(feature.qualifiers, width) end return result --- 42,51 ---- head = '' ! (position).wrap(width).each_line do |line| result << head << line head = indent end ! result << format_qualifiers(feature.qualifiers, indent, width) end return result *************** *** 130,136 **** if v == true ! lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold('/' + q + '=' + v, width) else if v[/\D/] --- 58,64 ---- if v == true ! lines =('/' + q).wrap(width) elsif q == 'translation' ! lines = ('/' + q + '="' + v + '"').fold(width) else if v[/\D/] *************** *** 139,143 **** v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end --- 67,71 ---- v = '"' + v + '"' end ! lines = ('/' + q + '=' + v).wrap(width) end *************** *** 147,177 **** end - def fold(str, width) - str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") - end - - def wrap(str, width) - result = [] - left = str.dup - while left and left.length > width - line = nil - width.downto(1) do |i| - if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then - line = left[0..(i-1)].sub(/ +\z/, '') - left = left[i..-1].sub(/\A +/, '') - break - end - end - if line.nil? then - line = left[0..(width-1)] - left = left[width..-1] - end - result << line - end - result << left if left - result_string = result.join("\n") - result_string << "\n" unless result_string.empty? - return result_string - end end # Format --- 75,78 ---- Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/common.rb,v retrieving revision 1.6 retrieving revision 1.6.2.1 diff -C2 -d -r1.6 -r1.6.2.1 *** common.rb 27 Dec 2007 17:36:02 -0000 1.6 --- common.rb 20 Feb 2008 09:56:22 -0000 1.6.2.1 *************** *** 38,42 **** # puts dna.randomize module Common ! # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. --- 38,42 ---- # puts dna.randomize module Common ! # Return sequence as # String[http://corelib.rubyonrails.org/classes/String.html]. *************** *** 66,69 **** --- 66,86 ---- self.class.new(self) end + + def format_embl + output_lines = Array.new + counter = 0 + remainder = self.window_search(60,60) do |subseq| + counter += 60 + subseq.gsub!(/(.{10})/, '\1 ') + output_lines.push(' '*5 + subseq + counter.to_s.rjust(9)) + end + counter += remainder.length + remainder = (remainder.to_s + ' '*(60-remainder.length)) + remainder.gsub!(/(.{10})/, '\1 ') + output_lines.push(' '*5 + remainder + counter.to_s.rjust(9)) + return output_lines.join("\n") + end + + # Normalize the current sequence, removing all whitespace and From aerts at dev.open-bio.org Wed Feb 20 09:56:24 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 09:56:24 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/embl test_embl_to_bioseq.rb, NONE, 1.1.2.1 test_embl.rb, 1.5, 1.5.2.1 test_embl_rel89.rb, 1.2, 1.2.2.1 Message-ID: <200802200956.m1K9uOKd015812@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/embl In directory dev.open-bio.org:/tmp/cvs-serv15755/test/unit/bio/db/embl Modified Files: Tag: BRANCH-biohackathon2008 test_embl.rb test_embl_rel89.rb Added Files: Tag: BRANCH-biohackathon2008 test_embl_to_bioseq.rb Log Message: * Rewrote some of the code for converting EMBL files into Bio::Sequence. * Added functionality to export Bio::Sequence to EMBL format. Changes: * renamed Sequence::Format#wrap and #fold to String#wrap and #fold (stored in bio.rb) * lib/bio/db/common.rb: - rewrote def ref and def references - added to_biosequence - def references now returns an Array instead of a Bio::References object (tests changed accordingly) * lib/bio/db/embl/embl.rb - def ft now returns Array instead of Bio::Features object (tests changed accordingly) * lib/bio/db/embl/format.erb * lib/bio/sequence/common.rb - added def format_embl --- NEW FILE: test_embl_to_bioseq.rb --- # # test/unit/bio/db/embl/test_embl.rb - Unit test for Bio::EMBL # # Copyright:: Copyright (C) 2005, 2008 # Mitsuteru Nakao # Jan Aerts # License:: The Ruby License # # $Id: test_embl_to_bioseq.rb,v 1.1.2.1 2008/02/20 09:56:22 aerts Exp $ # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio' require 'bio/db/embl/embl' module Bio class TestEMBLToBioSequence < Test::Unit::TestCase def setup bioruby_root = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5)).cleanpath.to_s input = File.open(File.join(bioruby_root, 'test', 'data', 'embl', 'AB090716.embl.rel89')).read embl_object = Bio::EMBL.new(input) embl_object.instance_eval { @data['OS'] = "Haplochromis sp. 'muzu rukwa'" } @bio_seq = embl_object.to_biosequence end def test_entry_id assert_equal('AB090716', @bio_seq.entry_id) end def test_primary_accession assert_equal('AB090716', @bio_seq.primary_accession) end def test_secondary_accessions assert_equal([], @bio_seq.secondary_accessions) end def test_molecule_type assert_equal('genomic DNA', @bio_seq.molecule_type) end def test_definition assert_equal("Haplochromis sp. 'muzu, rukwa' LWS gene for long wavelength-sensitive opsin, partial cds, specimen_voucher:specimen No. HT-9361.", @bio_seq.definition) end def test_topology assert_equal('linear', @bio_seq.topology) end def test_dates assert_equal('25-OCT-2002 (Rel. 73, Created)', @bio_seq.date_created) assert_equal('14-NOV-2006 (Rel. 89, Last updated, Version 3)', @bio_seq.date_modified) end def test_division assert_equal('VRT', @bio_seq.division) end def test_sequence_version assert_equal(1, @bio_seq.sequence_version) end def test_keywords assert_equal([], @bio_seq.keywords) end def test_species assert_equal("Haplochromis sp. 'muzu, rukwa'", @bio_seq.species) end def test_classification assert_equal(['Eukaryota','Metazoa','Chordata','Craniata','Vertebrata','Euteleostomi','Actinopterygii','Neopterygii','Teleostei','Euteleostei','Neoteleostei','Acanthomorpha','Acanthopterygii','Percomorpha','Perciformes','Labroidei','Cichlidae','African cichlids','Pseudocrenilabrinae','Haplochromini','Haplochromis'], @bio_seq.classification) end def test_references assert_equal(2, @bio_seq.references.length) assert_equal(Bio::Reference, @bio_seq.references[0].class) end def test_features assert_equal(3, @bio_seq.features.length) assert_equal(Bio::Feature, @bio_seq.features[0].class) end end # To really test the Bio::EMBL to Bio::Sequence conversion, we need to test if # that Bio::Sequence can be made into a valid Bio::EMBL again. class TestEMBLToBioSequenceRoundTrip < Test::Unit::TestCase def setup bioruby_root = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 5)).cleanpath.to_s input = File.open(File.join(bioruby_root, 'test', 'data', 'embl', 'AB090716.embl.rel89')).read embl_object_1 = Bio::EMBL.new(input) embl_object_1.instance_eval { @data['OS'] = "Haplochromis sp. 'muzu rukwa'" } @bio_seq_1 = embl_object_1.to_biosequence embl_object_2 = Bio::EMBL.new(@bio_seq_1.output(:embl)) @bio_seq_2 = embl_object_2.to_biosequence end def test_entry_id assert_equal('AB090716', @bio_seq_2.entry_id) end def test_primary_accession assert_equal('AB090716', @bio_seq_2.primary_accession) end def test_secondary_accessions assert_equal([], @bio_seq_2.secondary_accessions) end def test_molecule_type assert_equal('genomic DNA', @bio_seq_2.molecule_type) end def test_definition assert_equal("Haplochromis sp. 'muzu, rukwa' LWS gene for long wavelength-sensitive opsin, partial cds, specimen_voucher:specimen No. HT-9361.", @bio_seq_2.definition) end def test_topology assert_equal('linear', @bio_seq_2.topology) end def test_dates assert_equal('25-OCT-2002 (Rel. 73, Created)', @bio_seq_2.date_created) assert_equal('14-NOV-2006 (Rel. 89, Last updated, Version 3)', @bio_seq_2.date_modified) end def test_division assert_equal('VRT', @bio_seq_2.division) end def test_sequence_version assert_equal(1, @bio_seq_2.sequence_version) end def test_keywords assert_equal([], @bio_seq_2.keywords) end def test_species assert_equal("Haplochromis sp. 'muzu, rukwa'", @bio_seq_2.species) end def test_classification assert_equal(['Eukaryota','Metazoa','Chordata','Craniata','Vertebrata','Euteleostomi','Actinopterygii','Neopterygii','Teleostei','Euteleostei','Neoteleostei','Acanthomorpha','Acanthopterygii','Percomorpha','Perciformes','Labroidei','Cichlidae','African cichlids','Pseudocrenilabrinae','Haplochromini','Haplochromis'], @bio_seq_2.classification) end def test_references assert_equal(2, @bio_seq_2.references.length) assert_equal(Bio::Reference, @bio_seq_2.references[0].class) end def test_features a assert_equal(3, @bio_seq_2.features.length) assert_equal(Bio::Feature, @bio_seq_2.features[0].class) end end end Index: test_embl_rel89.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/db/embl/test_embl_rel89.rb,v retrieving revision 1.2 retrieving revision 1.2.2.1 diff -C2 -d -r1.2 -r1.2.2.1 *** test_embl_rel89.rb 5 Apr 2007 23:35:43 -0000 1.2 --- test_embl_rel89.rb 20 Feb 2008 09:56:22 -0000 1.2.2.1 *************** *** 156,160 **** # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Bio::References, @obj.references.class) end --- 156,160 ---- # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Array, @obj.references.class) end *************** *** 169,173 **** def test_ft ! assert_equal(Bio::Features, @obj.ft.class) end --- 169,173 ---- def test_ft ! assert_equal(Array, @obj.ft.class) end *************** *** 179,183 **** def test_ft_accessor ! assert_equal('CDS', @obj.ft.features[1].feature) end --- 179,183 ---- def test_ft_accessor ! assert_equal('CDS', @obj.ft[1].feature) end Index: test_embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/db/embl/test_embl.rb,v retrieving revision 1.5 retrieving revision 1.5.2.1 diff -C2 -d -r1.5 -r1.5.2.1 *** test_embl.rb 5 Apr 2007 23:35:43 -0000 1.5 --- test_embl.rb 20 Feb 2008 09:56:22 -0000 1.5.2.1 *************** *** 151,155 **** # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Bio::References, @obj.references.class) end --- 151,155 ---- # Bio::EMBLDB::COMMON#references def test_references ! assert_equal(Array, @obj.references.class) end *************** *** 164,168 **** def test_ft ! assert_equal(Bio::Features, @obj.ft.class) end --- 164,168 ---- def test_ft ! assert_equal(Array, @obj.ft.class) end *************** *** 174,178 **** def test_ft_accessor ! assert_equal('CDS', @obj.ft.features[1].feature) end --- 174,178 ---- def test_ft_accessor ! assert_equal('CDS', @obj.ft[1].feature) end From aerts at dev.open-bio.org Wed Feb 20 13:54:21 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 13:54:21 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89.2.1,1.89.2.2 Message-ID: <200802201354.m1KDsL5F016175@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv16153 Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: Fixed bug in formatting features when exporting to EMBL. Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89.2.1 retrieving revision 1.89.2.2 diff -C2 -d -r1.89.2.1 -r1.89.2.2 *** bio.rb 20 Feb 2008 09:56:22 -0000 1.89.2.1 --- bio.rb 20 Feb 2008 13:54:19 -0000 1.89.2.2 *************** *** 280,284 **** class String def fold(width = 80) ! self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end --- 280,284 ---- class String def fold(width = 80) ! self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n").sub(/\n$/, '') end *************** *** 308,310 **** return result_string end ! end \ No newline at end of file --- 308,310 ---- return result_string end ! end From aerts at dev.open-bio.org Wed Feb 20 13:54:21 2008 From: aerts at dev.open-bio.org (Jan Aerts) Date: Wed, 20 Feb 2008 13:54:21 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.4,1.4.2.5 Message-ID: <200802201354.m1KDsLWx016180@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv16153/bio/sequence Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: Fixed bug in formatting features when exporting to EMBL. Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.4 retrieving revision 1.4.2.5 diff -C2 -d -r1.4.2.4 -r1.4.2.5 *** format.rb 20 Feb 2008 09:56:22 -0000 1.4.2.4 --- format.rb 20 Feb 2008 13:54:19 -0000 1.4.2.5 *************** *** 47,51 **** --- 47,53 ---- end + result << "\n" result << format_qualifiers(feature.qualifiers, indent, width) + result << "\n" end return result *************** *** 62,66 **** lines = ('/' + q + '="' + v + '"').fold(width) else ! if v[/\D/] #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') --- 64,68 ---- lines = ('/' + q + '="' + v + '"').fold(width) else ! if ( v[/\D/] or q == 'chromosome' ) #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') *************** *** 72,76 **** lines.gsub!(/^/, indent) lines ! end.join end --- 74,78 ---- lines.gsub!(/^/, indent) lines ! end.join("\n") end From ngoto at dev.open-bio.org Wed Feb 20 17:04:49 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Wed, 20 Feb 2008 17:04:49 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24.2.1,1.24.2.2 Message-ID: <200802201704.m1KH4nCF017912@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv17810/lib/bio Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: Bio::References#new is added not to create Bio::References instances anymore. New transitional module Bio::References::BackwardCompatibilityForBioReferences is added to help keeping backward compatibility. (The only reason why not to erase Bio::References class is to load Marshal.dump data.) Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24.2.1 retrieving revision 1.24.2.2 diff -C2 -d -r1.24.2.1 -r1.24.2.2 *** reference.rb 18 Feb 2008 15:44:39 -0000 1.24.2.1 --- reference.rb 20 Feb 2008 17:04:47 -0000 1.24.2.2 *************** *** 580,587 **** --- 580,593 ---- # = DESCRIPTION # + # This class is OBSOLETED, and will soon be removed. + # Instead of this class, an array is to be used. + # + # # A container class for Bio::Reference objects. # # = USAGE # + # This class should NOT be used. + # # refs = Bio::References.new # refs.append(Bio::Reference.new(hash)) *************** *** 591,596 **** # class References ! include Enumerable ! # Array of Bio::Reference objects attr_accessor :references --- 597,638 ---- # class References ! ! # module to keep backward compatibility with obsoleted Bio::References ! module BackwardCompatibilityForBioReferences #:nodoc: ! ! # Backward compatibility with Bio::References#references. ! # Now, references are stored in an array, and ! # you should change your code not to use this method. ! def references ! warn 'Bio::References is obsoleted. Now, references are stored in an array.' ! self ! end ! ! # Backward compatibility with Bio::References#append. ! # Now, references are stored in an array, and ! # you should change your code not to use this method. ! def append(reference) ! warn 'Bio::References is obsoleted. Now, references are stored in an array.' ! self.push(reference) if reference.is_a? Reference ! self ! end ! end #module BackwardCompatibilityForBioReferences ! ! # This method should not be used. ! # Only for backward compatibility of existing code. ! # ! # Since Bio::References is obsoleted, ! # Bio::References.new not returns Bio::References object, ! # but modifies given _ary_ and returns the _ary_. ! # ! # *Arguments*: ! # * (optional) __: Array of Bio::Reference objects ! # *Returns*:: the given array ! def self.new(ary = []) ! warn 'Bio::References is obsoleted. Some methods are added to given array to keep backward compatibility.' ! ary.extend(BackwardCompatibilityForBioReferences) ! ary ! end ! # Array of Bio::Reference objects attr_accessor :references From ngoto at dev.open-bio.org Fri Feb 22 14:26:18 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 22 Feb 2008 14:26:18 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.89.2.2,1.89.2.3 Message-ID: <200802221426.m1MEQI5W030582@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory dev.open-bio.org:/tmp/cvs-serv30562 Modified Files: Tag: BRANCH-biohackathon2008 bio.rb Log Message: reverted to 1.89 Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.89.2.2 retrieving revision 1.89.2.3 diff -C2 -d -r1.89.2.2 -r1.89.2.3 *** bio.rb 20 Feb 2008 13:54:19 -0000 1.89.2.2 --- bio.rb 22 Feb 2008 14:26:16 -0000 1.89.2.3 *************** *** 278,310 **** end - class String - def fold(width = 80) - self.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n").sub(/\n$/, '') - end - - def wrap(width = 80, prefix = '') - actual_width = width - prefix.length - result = [] - left = self.dup - while left and left.length > actual_width - line = nil - actual_width.downto(1) do |i| - if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then - line = left[0..(i-1)].sub(/ +\z/, '') - left = left[i..-1].sub(/\A +/, '') - break - end - end - if line.nil? then - line = left[0..(actual_width-1)] - left = left[actual_width..-1] - end - result << line - end - result << left if left - result_string = result.join("\n#{prefix}") - result_string = prefix + result_string unless result_string.empty? - # result_string << "\n" unless result_string.empty? - return result_string - end - end --- 278,279 ---- From ngoto at dev.open-bio.org Fri Feb 22 14:30:46 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Fri, 22 Feb 2008 14:30:46 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence format.rb,1.4.2.5,1.4.2.6 Message-ID: <200802221430.m1MEUknT030652@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory dev.open-bio.org:/tmp/cvs-serv30611 Modified Files: Tag: BRANCH-biohackathon2008 format.rb Log Message: * fold() and wrap() are reverted * Bug fix in format_features() and format_qualifiers() * The content of 'translate' qualifier is now wrapped by double quote Index: format.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence/format.rb,v retrieving revision 1.4.2.5 retrieving revision 1.4.2.6 diff -C2 -d -r1.4.2.5 -r1.4.2.6 *** format.rb 20 Feb 2008 13:54:19 -0000 1.4.2.5 --- format.rb 22 Feb 2008 14:30:44 -0000 1.4.2.6 *************** *** 31,34 **** --- 31,109 ---- # puts s.output(:embl) module Format + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the FASTA format string of the sequence. + # + # UNFORTUNATLY, the current implementation of Bio::Sequence is incapable of + # using either the header or width arguments. So something needs to be + # changed... + # + # Currently, this method is used in Bio::Sequence#output like so, + # + # s = Bio::Sequence.new('atgc') + # puts s.output(:fasta) #=> "> \natgc\n" + # --- + # *Arguments*: + # * (optional) _header_: String (default nil) + # * (optional) _width_: Fixnum (default nil) + # *Returns*:: String object + def format_fasta(header = nil, width = nil) + header ||= "#{@entry_id} #{@definition}" + + ">#{header}\n" + + if width + @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") + else + @seq.to_s + "\n" + end + end + + #--- + + # Not yet implemented :) + # Remove the nodoc command after implementation! + # --- + # *Returns*:: String object + #def format_gff #:nodoc: + # raise NotImplementedError + #end + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the Genbank format string of the sequence. + # Used in Bio::Sequence#output. + # --- + # *Returns*:: String object + #def format_genbank + # prefix = ' ' * 5 + # indent = prefix + ' ' * 16 + # fwidth = 79 - indent.length + # + # format_features(prefix, indent, fwidth) + #end + + # INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any + # case, it would be difficult to successfully call this method outside + # its expected context). + # + # Output the EMBL format string of the sequence. + # Used in Bio::Sequence#output. + # --- + # *Returns*:: String object + #def format_embl + # prefix = 'FT ' + # indent = prefix + ' ' * 16 + # fwidth = 80 - indent.length + # + # format_features(prefix, indent, fwidth) + #end + + #+++ + private *************** *** 42,53 **** head = '' ! (position).wrap(width).each_line do |line| result << head << line head = indent end - result << "\n" result << format_qualifiers(feature.qualifiers, indent, width) - result << "\n" end return result --- 117,126 ---- head = '' ! wrap(position, width).each_line do |line| result << head << line head = indent end result << format_qualifiers(feature.qualifiers, indent, width) end return result *************** *** 60,80 **** if v == true ! lines =('/' + q).wrap(width) elsif q == 'translation' ! lines = ('/' + q + '="' + v + '"').fold(width) else ! if ( v[/\D/] or q == 'chromosome' ) #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end ! lines = ('/' + q + '=' + v).wrap(width) end lines.gsub!(/^/, indent) lines ! end.join("\n") end end # Format --- 133,180 ---- if v == true ! lines = wrap('/' + q, width) elsif q == 'translation' ! lines = fold("/#{q}=\"#{v}\"", width) else ! if v[/\D/] or q == 'chromosome' #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end ! lines = wrap('/' + q + '=' + v, width) end lines.gsub!(/^/, indent) lines ! end.join ! end ! ! def fold(str, width) ! str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end + def wrap(str, width) + result = [] + left = str.dup + while left and left.length > width + line = nil + width.downto(1) do |i| + if left[i..i] == ' ' or /[\,\;]/ =~ left[(i-1)..(i-1)] then + line = left[0..(i-1)].sub(/ +\z/, '') + left = left[i..-1].sub(/\A +/, '') + break + end + end + if line.nil? then + line = left[0..(width-1)] + left = left[width..-1] + end + result << line + end + result << left if left + result_string = result.join("\n") + result_string << "\n" unless result_string.empty? + return result_string + end end # Format From ngoto at dev.open-bio.org Thu Feb 28 05:51:05 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 28 Feb 2008 05:51:05 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.24.2.2,1.24.2.3 Message-ID: <200802280551.m1S5p5eX020471@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory dev.open-bio.org:/tmp/cvs-serv20451 Modified Files: Tag: BRANCH-biohackathon2008 reference.rb Log Message: @sequence_position should be nil if no information available Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.24.2.2 retrieving revision 1.24.2.3 diff -C2 -d -r1.24.2.2 -r1.24.2.3 *** reference.rb 20 Feb 2008 17:04:47 -0000 1.24.2.2 --- reference.rb 28 Feb 2008 05:51:03 -0000 1.24.2.3 *************** *** 138,142 **** @mesh = hash['mesh'] @embl_gb_record_number = hash['embl_gb_record_number'] || nil ! @sequence_position = hash['sequence_position'] || [] @comments = hash['comments'] || [] @xrefs = hash['xrefs'] || [] --- 138,142 ---- @mesh = hash['mesh'] @embl_gb_record_number = hash['embl_gb_record_number'] || nil ! @sequence_position = hash['sequence_position'] || nil @comments = hash['comments'] || [] @xrefs = hash['xrefs'] || [] *************** *** 280,284 **** end end ! if @sequence_position != '' lines << "RP #{@sequence_position}" end --- 280,284 ---- end end ! if ! @sequence_position.nil? lines << "RP #{@sequence_position}" end From ngoto at dev.open-bio.org Thu Feb 28 05:54:53 2008 From: ngoto at dev.open-bio.org (Naohisa Goto) Date: Thu, 28 Feb 2008 05:54:53 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/genbank common.rb,1.11,1.11.2.1 Message-ID: <200802280554.m1S5sr5Z020520@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/genbank In directory dev.open-bio.org:/tmp/cvs-serv20500/db/genbank Modified Files: Tag: BRANCH-biohackathon2008 common.rb Log Message: changed to parse sequence position and reference number in REFERENCES Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/genbank/common.rb,v retrieving revision 1.11 retrieving revision 1.11.2.1 diff -C2 -d -r1.11 -r1.11.2.1 *** common.rb 5 Apr 2007 23:35:40 -0000 1.11 --- common.rb 28 Feb 2008 05:54:51 -0000 1.11.2.1 *************** *** 141,144 **** --- 141,149 ---- subtag2array(ref).each do |field| case tag_get(field) + when /^\s*REFERENCE\s+(\d+)(\s+\(bases\s+(\d+)\s+to\s+(\d+)\))?/ + hash['embl_gb_record_number'] = $1.to_i + if $2 then + hash['sequence_position'] = "#{$3}-#{$4}" + end when /AUTHORS/ authors = truncate(tag_cut(field)) From pjotr at dev.open-bio.org Sat Feb 2 13:03:36 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 13:03:36 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.13,1.14 Message-ID: <200802021303.m12D3PNX031194@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31174 Modified Files: Tutorial.rd Log Message: Tabs in the Tutorial broke the rd parser - the Wiki will be fixed now. Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** Tutorial.rd 9 Jul 2007 12:28:07 -0000 1.13 --- Tutorial.rd 2 Feb 2008 13:03:23 -0000 1.14 *************** *** 1,2 **** --- 1,10 ---- + # This document is generated with a version of rd2html (part of Hiki) + # + # A possible test run could be from rdtool: + # + # ruby -I lib ./bin/rd2 ~/izip/cvs/opensource/bioruby/doc/Tutorial.rd + # + # A common problem is tabs in the text file! + =begin *************** *** 5,13 **** $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2007 Pjotr Prins, Naohisa Goto and others IMPORTANT NOTICE: This page is maintained in the BioRuby CVS --- 13,21 ---- $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2008 Pjotr Prins, Naohisa Goto and others IMPORTANT NOTICE: This page is maintained in the BioRuby CVS *************** *** 32,36 **** version it has with the ! % ruby -v command. Showing something like: --- 40,44 ---- version it has with the ! % ruby -v command. Showing something like: *************** *** 55,59 **** bioruby> puts seq atgcatgcaaaa ! bioruby> puts seq.complement ttttgcatgcat --- 63,67 ---- bioruby> puts seq atgcatgcaaaa ! bioruby> puts seq.complement ttttgcatgcat *************** *** 94,98 **** puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. --- 102,106 ---- puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. *************** *** 159,163 **** * Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the leftover subsequence shorter than 10000bp is also added --- 167,171 ---- * Divide a genome sequence into sections of 10000bp and output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the leftover subsequence shorter than 10000bp is also added *************** *** 252,258 **** #!/usr/bin/env ruby ! require 'bio' ! ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| --- 260,266 ---- #!/usr/bin/env ruby ! require 'bio' ! ff = Bio::FlatFile.new(Bio::GenBank, ARGF) ff.each_entry do |gb| *************** *** 470,475 **** rebase = Bio::RestrictionEnzyme.rebase ! rebase.each do |enzyme_name, info| ! p enzyme_name end --- 478,483 ---- rebase = Bio::RestrictionEnzyme.rebase ! rebase.each do |enzyme_name, info| ! p enzyme_name end *************** *** 483,488 **** end end ! res.each do |frag| ! em = EnzymeMatch.new em.p_left = frag.p_left --- 491,496 ---- end end ! res.each do |frag| ! em = EnzymeMatch.new em.p_left = frag.p_left *************** *** 494,498 **** em.enzyme = ar_enz em.sequence = ar_seq ! p em end --- 502,506 ---- em.enzyme = ar_enz em.sequence = ar_seq ! p em end *************** *** 1160,1168 **** == Comparing BioProjects ! For a quick functional comparison of BioRuby, BioPerl, BioPython and Bioconductor (R) see (()) == Using BioRuby with R ! Using Ruby with R Pjotr wrote a section on SciRuby. See (()) == Using BioPerl or BioPython from Ruby --- 1168,1176 ---- == Comparing BioProjects ! For a quick functional comparison of BioRuby, BioPerl, BioPython and Bioconductor (R) see (()) == Using BioRuby with R ! Using Ruby with R Pjotr wrote a section on SciRuby. See (()) == Using BioPerl or BioPython from Ruby *************** *** 1182,1184 **** =end - --- 1190,1191 ---- From pjotr at dev.open-bio.org Sat Feb 2 14:02:03 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 14:02:03 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.14,1.15 Message-ID: <200802021401.m12E1uuN031293@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31273 Modified Files: Tutorial.rd Log Message: Updating tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** Tutorial.rd 2 Feb 2008 13:03:23 -0000 1.14 --- Tutorial.rd 2 Feb 2008 14:01:54 -0000 1.15 *************** *** 3,7 **** # A possible test run could be from rdtool: # ! # ruby -I lib ./bin/rd2 ~/izip/cvs/opensource/bioruby/doc/Tutorial.rd # # A common problem is tabs in the text file! --- 3,12 ---- # A possible test run could be from rdtool: # ! # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd ! # ! # or with style sheet: ! # ! # ruby -I lib ./bin/rd2 -r rd/rd2html-lib.rb --with-c ! ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # # A common problem is tabs in the text file! *************** *** 9,39 **** =begin ! See the document in the CVS repository ./doc/(()) - for a potentially more up-to-date edition. This one was updated: ! ! $Id$ ! Translated into English: Naohisa Goto ! Editor: PjotrPrins

! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2008 Pjotr Prins, Naohisa Goto and others ! IMPORTANT NOTICE: This page is maintained in the BioRuby CVS ! repository. Please edit the file there otherwise changes may get ! lost. See (()) for CVS and mailing list ! access. ! = BioRuby Tutorial == Introduction ! This is a tutorial for using Bioruby. For BioRuby you need to install ! Ruby and the BioRuby package on your computer. For each following the ! instruction on the respective websites. (EDITOR's NOTE: include URL's) ! ! (EDITOR's NOTE: describe rdoc use for individual classes) ! For further information on the Ruby language see the section 'Further ! reading' at the end. You can check whether Ruby is installed on your computer and what --- 14,40 ---- =begin ! = BioRuby Tutorial ! Editor: PjotrPrins

! * Copyright (C) 2001-2003 KATAYAMA Toshiaki ! * Copyright (C) 2005-2008 Pjotr Prins, Naohisa Goto and others ! The latest version resides in the CVS repository ./doc/(()). This one was updated: ! $Id$ ! in preparation for the (()) == Introduction ! This is a tutorial for using Bioruby. A basic knowledge of Ruby is required. ! If you want to know more about the programming langauge Ruby we recommend the ! excellent book (()) ! by Dave Thomas and Andy Hunt - some of it is online ! (()). ! For BioRuby you need to install ! Ruby and the BioRuby package on your computer. You can check whether Ruby is installed on your computer and what *************** *** 46,49 **** --- 47,61 ---- ruby 1.8.5 (2006-08-25) [powerpc-linux] + If you see no such thing you'll have to install Ruby using your installation + manager. For more information see the + (()) website. + + Once Ruby is works download and install Bioruby using the links on the + (()) website. + + A lot of BioRuby's documentation exists in the source code and unit tests. To + really dive in you will need the latest source code tree. The embedded rdoc + documentation can be viewed online at + (()). But first lets start! == Trying Bioruby *************** *** 52,56 **** following command ! $BIORUBY/bin/bioruby and you should see a prompt --- 64,68 ---- following command ! ./bin/bioruby and you should see a prompt *************** *** 93,97 **** puts seq.translate # translation (Bio::Sequence::AA object) puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) p seq.translate.codes # shows three-letter codes (Array) --- 105,110 ---- puts seq.translate # translation (Bio::Sequence::AA object) puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 ! # (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) p seq.translate.codes # shows three-letter codes (Array) *************** *** 114,120 **** % ri File.open ! Nucleic acid sequence is an object of +Bio::Sequence::NA+ class, and ! amino acid sequence is an object of +Bio::Sequence::AA+ class. Shared ! methods are in the parent +Bio::Sequence+ class. As Bio::Sequence class inherits Ruby's String class, you can use --- 127,133 ---- % ri File.open ! Nucleic acid sequence is an object of Bio::Sequence::NA class, and ! amino acid sequence is an object of Bio::Sequence::AA class. Shared ! methods are in the parent Bio::Sequence class. As Bio::Sequence class inherits Ruby's String class, you can use *************** *** 297,303 **** end - (TRANSLATOR'S NOTE: Bio::DB.open have not been used so well.) - (EDITOR's NOTE: Test code) - Next, we are going to parse the GenBank 'features', which is normally very complicated: --- 310,313 ---- *************** *** 382,387 **** Databases in BioRuby are essentially accessed like that of GenBank ! with classes like Bio::GenBank, Bio::KEGG::GENES, ! (EDITOR's NOTE: include complete list) In many cases the Bio::DatabaseClass acts as a factory pattern --- 392,397 ---- Databases in BioRuby are essentially accessed like that of GenBank ! with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in ! the ./lib/bio/db directory of the BioRuby source tree. In many cases the Bio::DatabaseClass acts as a factory pattern *************** *** 1151,1160 **** == Further reading ! See the BioRuby in anger Wiki and the class documentation for more ! information on BioRuby. - The best book to get for understanding and getting productive with the - Ruby language is 'Programming Ruby' by Dave Thomas and Andy - Hunt. Strongly recommended! = APPENDIX --- 1161,1169 ---- == Further reading ! See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the ! source code and unit tests. To really dive in you will need the latest source ! code tree. The embedded rdoc documentation can be viewed online at ! (()). = APPENDIX *************** *** 1189,1191 **** --- 1198,1207 ---- carefully that come with each package. + == Modifying this page + + IMPORTANT NOTICE: This page is maintained in the BioRuby CVS + repository. Please edit the file there otherwise changes may get + lost. See (()) for CVS and mailing list + access. + =end From pjotr at dev.open-bio.org Sat Feb 2 14:15:19 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sat, 02 Feb 2008 14:15:19 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.15,1.16 Message-ID: <200802021415.m12EFAqB031346@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv31326 Modified Files: Tutorial.rd Log Message: Modified tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** Tutorial.rd 2 Feb 2008 14:01:54 -0000 1.15 --- Tutorial.rd 2 Feb 2008 14:15:08 -0000 1.16 *************** *** 115,120 **** puts seq.complement.translate # translation of complemental strand ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} ! p randomseq = Bio::Sequence::NA.randomize(counts) # reshuffle sequence with same freq. The p, print and puts methods are standard Ruby ways of outputting to --- 115,122 ---- puts seq.complement.translate # translation of complemental strand ! # reshuffle sequence with same frequencies: ! counts = {'a'=>seq.count('a'),'c'=>seq.count('c'), ! 'g'=>seq.count('g'),'t'=>seq.count('t')} ! p randomseq = Bio::Sequence::NA.randomize(counts) The p, print and puts methods are standard Ruby ways of outputting to *************** *** 265,269 **** print ">#{gb.accession} " # Accession puts gb.definition # Definition ! puts gb.naseq # Nucleic acid sequence (Bio::Sequence::NA object) end --- 267,272 ---- print ">#{gb.accession} " # Accession puts gb.definition # Definition ! puts gb.naseq # Nucleic acid sequence ! # (Bio::Sequence::NA object) end *************** *** 387,391 **** aaseq.splicing('21..119') - (EDITOR's NOTE: why use STRINGs here?) === More databases --- 390,393 ---- *************** *** 494,498 **** and cut a sequence with an enzyme follow up with: ! res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, {:view_ranges => true}) if res.kind_of? Symbol #error err = Err.find_by_code(res.to_s) --- 496,501 ---- and cut a sequence with an enzyme follow up with: ! res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0}, ! {:view_ranges => true}) if res.kind_of? Symbol #error err = Err.find_by_code(res.to_s) *************** *** 529,534 **** fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/). First, you must prepare your FASTA-formatted database sequence file ! target.pep and FASTA-formatted query.pep. (TRANSLATOR'S NOTE: I think ! we should provide sample data to readers.) #!/usr/bin/env ruby --- 532,536 ---- fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/). First, you must prepare your FASTA-formatted database sequence file ! target.pep and FASTA-formatted query.pep. #!/usr/bin/env ruby *************** *** 536,547 **** require 'bio' ! # Creates FASTA factory object ("ssearch" instead of "fasta34" can also work) factory = Bio::Fasta.local('fasta34', ARGV.pop) (EDITOR's NOTE: not consistent pop command) - # Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text) ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ! # Iterates over each entry. the variable "entry" is a Bio::FastaFormat object. ff.each do |entry| # shows definition line (begins with '>') to the standard error output --- 538,550 ---- require 'bio' ! # Creates FASTA factory object ("ssearch" instead of ! # "fasta34" can also work) factory = Bio::Fasta.local('fasta34', ARGV.pop) (EDITOR's NOTE: not consistent pop command) ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ! # Iterates over each entry. the variable "entry" is a ! # Bio::FastaFormat object: ff.each do |entry| # shows definition line (begins with '>') to the standard error output *************** *** 555,559 **** # If E-value is smaller than 0.0001 if hit.evalue < 0.0001 ! # shows identifier of query and hit, E-value, start and end positions of homologous region (TRANSLATOR'S NOTE: should I change Japanese document?) print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at --- 558,563 ---- # If E-value is smaller than 0.0001 if hit.evalue < 0.0001 ! # shows identifier of query and hit, E-value, start and ! # end positions of homologous region print "#{hit.query_id} : evalue #{hit.evalue}\t#{hit.target_id} at " p hit.lap_at *************** *** 569,573 **** FASTA many times easily. Instead of using Fasta#query method, Bio::Sequence#fasta method can be used. - (TRANSLATOR'S NOTE: Bio::Sequence#fasta are not so frequently used.) seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE" --- 573,576 ---- *************** *** 585,589 **** with the Report object. For example, getting information for hits: - report.each do |hit| puts hit.evalue # E-value --- 588,591 ---- *************** *** 594,606 **** puts hit.query_def # definition(comment line) of query sequence puts hit.query_len # length of query sequence ! puts hit.query_seq # query sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) puts hit.target_id # identifier of hit sequence puts hit.target_def # definition(comment line) of hit sequence puts hit.target_len # length of hit sequence ! puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of hit sequence) ! puts hit.query_start # start position of homologous region in query sequence ! puts hit.query_end # end position of homologous region in query sequence ! puts hit.target_start # start posiotion of homologous region in hit(target) sequence ! puts hit.target_end # end position of homologous region in hit(target) sequence puts hit.lap_at # array of above four numbers end --- 596,612 ---- puts hit.query_def # definition(comment line) of query sequence puts hit.query_len # length of query sequence ! puts hit.query_seq # sequence of homologous region puts hit.target_id # identifier of hit sequence puts hit.target_def # definition(comment line) of hit sequence puts hit.target_len # length of hit sequence ! puts hit.target_seq # hit of homologous region of hit sequence ! puts hit.query_start # start position of homologous ! # region in query sequence ! puts hit.query_end # end position of homologous region ! # in query sequence ! puts hit.target_start # start posiotion of homologous region ! # in hit(target) sequence ! puts hit.target_end # end position of homologous region ! # in hit(target) sequence puts hit.lap_at # array of above four numbers end *************** *** 695,717 **** report.each do |hit| ! puts hit.bit_score # bit score (*) ! puts hit.query_seq # query sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) ! puts hit.midline # middle line string of alignment of homologous region (*) ! puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence) ! puts hit.evalue # E-value ! puts hit.identity # % identity ! puts hit.overlap # length of overlapping region ! puts hit.query_id # identifier of query sequence ! puts hit.query_def # definition(comment line) of query sequence ! puts hit.query_len # length of query sequence ! puts hit.target_id # identifier of hit sequence ! puts hit.target_def # definition(comment line) of hit sequence ! puts hit.target_len # length of hit sequence ! puts hit.query_start # start position of homologous region in query sequence ! puts hit.query_end # end position of homologous region in query sequence ! puts hit.target_start # start position of homologous region in hit(target) sequence ! puts hit.target_end # end position of homologous region in hit(target) sequence ! puts hit.lap_at # array of above four numbers end --- 701,723 ---- report.each do |hit| ! puts hit.bit_score ! puts hit.query_seq ! puts hit.midline ! puts hit.target_seq ! puts hit.evalue ! puts hit.identity ! puts hit.overlap ! puts hit.query_id ! puts hit.query_def ! puts hit.query_len ! puts hit.target_id ! puts hit.target_def ! puts hit.target_len ! puts hit.query_start ! puts hit.query_end ! puts hit.target_start ! puts hit.target_end ! puts hit.lap_at end *************** *** 1171,1175 **** == KEGG API ! Please refer to KEGG_API.rd.ja (TRANSLATOR'S NOTE: English version: (()) ) and * (()) --- 1177,1181 ---- == KEGG API ! Please refer to KEGG_API.rd.ja (English version: (()) ) and * (()) From pjotr at dev.open-bio.org Sun Feb 3 17:17:59 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Sun, 03 Feb 2008 17:17:59 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.16,1.17 Message-ID: <200802031717.m13HHoa6015904@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv15881/doc Modified Files: Tutorial.rd Log Message: More doctests in Tutorial.rd Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** Tutorial.rd 2 Feb 2008 14:15:08 -0000 1.16 --- Tutorial.rd 3 Feb 2008 17:17:48 -0000 1.17 *************** *** 13,16 **** --- 13,17 ---- =begin + #doctest Testing bioruby = BioRuby Tutorial *************** *** 64,68 **** following command ! ./bin/bioruby and you should see a prompt --- 65,70 ---- following command ! ./bin/bioruby or ! ruby -I lib bin/bioruby and you should see a prompt *************** *** 73,80 **** bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! bioruby> puts seq ! atgcatgcaaaa ! bioruby> puts seq.complement ! ttttgcatgcat == Working with nucleic / amino acid sequences (Bio::Sequence class) --- 75,82 ---- bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! ==> "atgcatgcaaaa" ! ! bioruby> seq.complement ! ==> "ttttgcatgcat" == Working with nucleic / amino acid sequences (Bio::Sequence class) *************** *** 89,122 **** defined in codontable.rb). ! #!/usr/bin/env ruby ! ! require 'bio' ! ! seq = Bio::Sequence::NA.new("atgcatgcaaaa") ! ! puts seq # original sequence ! puts seq.complement # complemental sequence (Bio::Sequence::NA object) ! puts seq.subseq(3,8) # gets subsequence of positions 3 to 8 ! p seq.gc_percent # GC percent (BioRuby 0.6.X: Float, BioRuby 0.7 or later: Integer) ! p seq.composition # nucleic acid compositions (Hash) ! puts seq.translate # translation (Bio::Sequence::AA object) ! puts seq.translate(2) # translation from frame 2 (default is frame 1) ! puts seq.translate(1,11) # using codon table No.11 ! # (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) ! p seq.translate.codes # shows three-letter codes (Array) ! p seq.translate.names # shows amino acid names (Array) ! p seq.translate.composition # amino acid compositions (Hash) ! p seq.translate.molecular_weight # calculating molecular weight (Float) ! puts seq.complement.translate # translation of complemental strand - # reshuffle sequence with same frequencies: - counts = {'a'=>seq.count('a'),'c'=>seq.count('c'), - 'g'=>seq.count('g'),'t'=>seq.count('t')} - p randomseq = Bio::Sequence::NA.randomize(counts) The p, print and puts methods are standard Ruby ways of outputting to --- 91,136 ---- defined in codontable.rb). + bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa") + ==> "atgcatgcaaaa" ! # complemental sequence (Bio::Sequence::NA object) ! bioruby> seq.complement ! ==> "ttttgcatgcat" ! bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 ! ==> "gcatgc" ! bioruby> seq.gc_percent ! ==> 33 ! bioruby> seq.composition ! ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} ! bioruby> seq.translate ! ==> "MHAK" ! bioruby> seq.translate(2) # translate from frame 2 ! ==> "CMQ" ! bioruby> seq.translate(1,11) # codon table 11 ! ==> "MHAK" ! bioruby> seq.translate.codes ! ==> ["Met", "His", "Ala", "Lys"] ! bioruby> seq.translate.names ! ==> ["methionine", "histidine", "alanine", "lysine"] ! bioruby> seq.translate.composition ! ==> {"K"=>1, "A"=>1, "M"=>1, "H"=>1} ! bioruby> seq.translate.molecular_weight ! ==> 485.605 ! bioruby> seq.complement.translate ! ==> "FCMH" ! get a random sequence with the same NA count: ! bioruby> counts = {'a'=>seq.count('a'),'c'=>seq.count('c'),'g'=>seq.count('g'),'t'=>seq.count('t')} ! ==> {"a"=>6, "c"=>2, "g"=>2, "t"=>2} ! bioruby!> randomseq = Bio::Sequence::NA.randomize(counts) ! ==!> "aaacatgaagtc" ! bioruby!> print counts ! a6c2g2t2 ! bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} The p, print and puts methods are standard Ruby ways of outputting to *************** *** 140,152 **** has index 0, for example: ! s = 'abc' ! puts s[0].chr ! ! >a ! ! puts s[0..1] ! ! >ab ! So when using String methods, you should subtract 1 from positions --- 154,163 ---- has index 0, for example: ! bioruby> s = 'abc' ! ==> "abc" ! bioruby> s[0].chr ! ==> "a" ! bioruby> s[0..1] ! ==> "ab" So when using String methods, you should subtract 1 from positions *************** *** 160,169 **** through a variable named +s+. ! * Shows average percentage of GC content for 100 bases (stepping ! the default one base at a time) ! seq.window_search(100) do |s| ! puts s.gc_percent ! end Since the class of each subsequence is the same as original sequence --- 171,182 ---- through a variable named +s+. ! * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ! ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! ! bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" ! Since the class of each subsequence is the same as original sequence *************** *** 1165,1168 **** --- 1178,1192 ---- included - with output) + == Unit testing and doctests + + BioRuby comes with an extensive testing framework with over 1300 tests and 2700 + assertions. To run the unit tests: + + cd test + ruby runner.rb + + We have also started with doctest for Ruby. We are porting the examples + in this tutorial to doctest - more info upcoming. + == Further reading From pjotr at dev.open-bio.org Tue Feb 5 12:01:26 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Tue, 05 Feb 2008 12:01:26 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.17,1.18 Message-ID: <200802051201.m15C1JTf032112@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv32092/doc Modified Files: Tutorial.rd Log Message: Minor tweak to Tutorial.rd Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** Tutorial.rd 3 Feb 2008 17:17:48 -0000 1.17 --- Tutorial.rd 5 Feb 2008 12:01:16 -0000 1.18 *************** *** 129,135 **** bioruby!> print counts ! a6c2g2t2 bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} --- 129,135 ---- bioruby!> print counts ! a6c2g2t2 bioruby!> p counts ! {"a"=>6, "c"=>2, "g"=>2, "t"=>2} *************** *** 173,183 **** * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" - Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can --- 173,182 ---- * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) ! bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" Since the class of each subsequence is the same as original sequence (Bio::Sequence::NA or Bio::Sequence::AA or Bio::Sequence), you can From pjotr at dev.open-bio.org Tue Feb 5 12:11:20 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Tue, 05 Feb 2008 12:11:20 -0000 Subject: [BioRuby-cvs] bioruby/sample gb2fasta.rb,0.5,0.6 Message-ID: <200802051211.m15CBDam032291@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/sample In directory dev.open-bio.org:/tmp/cvs-serv32271/sample Modified Files: gb2fasta.rb Log Message: Fixed broken require in gb2fasta example Index: gb2fasta.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/sample/gb2fasta.rb,v retrieving revision 0.5 retrieving revision 0.6 diff -C2 -d -r0.5 -r0.6 *** gb2fasta.rb 23 Jul 2002 04:51:24 -0000 0.5 --- gb2fasta.rb 5 Feb 2008 12:11:11 -0000 0.6 *************** *** 19,24 **** # ! require 'bio/io/flatfile' ! require 'bio/db/genbank' include Bio --- 19,23 ---- # ! require 'bio' include Bio From pjotr at dev.open-bio.org Wed Feb 6 16:26:05 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Wed, 06 Feb 2008 16:26:05 -0000 Subject: [BioRuby-cvs] bioruby/sample na2aa.rb,NONE,1.1 Message-ID: <200802061625.m16GPuIu005441@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/sample In directory dev.open-bio.org:/tmp/cvs-serv5421 Added Files: na2aa.rb Log Message: Simple example to translate any NA to AA fasta --- NEW FILE: na2aa.rb --- #!/usr/bin/env ruby # # translate.rb - translate any NA input into AA FASTA format # # Copyright (C) 2008 KATAYAMA Toshiaki & Pjotr Prins # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # $Id: na2aa.rb,v 1.1 2008/02/06 16:25:53 pjotr Exp $ # require 'bio' require 'pp' include Bio ARGV.each do | fn | Bio::FlatFile.auto(fn).each do | item | seq = Sequence::NA.new(item.data) aa = seq.translate aa.gsub!(/X/,'-') rec = Bio::FastaFormat.new('> '+item.definition+"\n"+aa) print rec end end From pjotr at dev.open-bio.org Mon Feb 11 07:08:56 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Mon, 11 Feb 2008 07:08:56 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.18,1.19 Message-ID: <200802110708.m1B78mwU007283@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv7263/doc Modified Files: Tutorial.rd Log Message: Expanding on the Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** Tutorial.rd 5 Feb 2008 12:01:16 -0000 1.18 --- Tutorial.rd 11 Feb 2008 07:08:46 -0000 1.19 *************** *** 1,5 **** # This document is generated with a version of rd2html (part of Hiki) # ! # A possible test run could be from rdtool: # # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd --- 1,5 ---- # This document is generated with a version of rd2html (part of Hiki) # ! # A possible test run could be from rdtool (on Debian package rdtool) # # ruby -I lib ./bin/rd2 ~/cvs/opensource/bioruby/doc/Tutorial.rd *************** *** 10,14 **** ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # ! # A common problem is tabs in the text file! =begin --- 10,23 ---- ss=bioruby.css ~/cvs/opensource/bioruby/doc/Tutorial.rd > ~/bioruby.html # ! # in Debian: ! # ! # rd2 -r rd/rd2html-lib --with-css="/home/wrk/izip/cvs/opensource/bioruby/lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.css" Tutorial.rd > index.html ! # ! # A common problem is tabs in the text file! TABs are not allowed. ! # ! # To add tests run Toshiaki's bioruby shell and paste in the query plus ! # results. ! # ! # To run the embedded Ruby doctests you can get the doctest.rb from Pjotr. =begin *************** *** 36,41 **** (()). ! For BioRuby you need to install ! Ruby and the BioRuby package on your computer. You can check whether Ruby is installed on your computer and what --- 45,49 ---- (()). ! For BioRuby you need to install Ruby and the BioRuby package on your computer You can check whether Ruby is installed on your computer and what *************** *** 80,83 **** --- 88,95 ---- ==> "ttttgcatgcat" + See the the Bioruby shell section below for more tweaking. If you have trouble running + examples also check the section below on trouble shooting. You can also post a + question to the mailing list. BioRuby developers usually try to help. + == Working with nucleic / amino acid sequences (Bio::Sequence class) *************** *** 171,181 **** through a variable named +s+. ! * Shows average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! bioruby> seq.window_search(20) { |s| print s.gc_percent,',' } ! 30,35,40,40,35,35,35,30,25,30,30,30,35,35,35,35,35,40,45,45,45,45,40,35,40,40,40,40,40,35,35,35,30,30,30, ==> "" Since the class of each subsequence is the same as original sequence --- 183,195 ---- through a variable named +s+. ! * Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa" ! bioruby> a=[]; seq.window_search(20) { |s| a.push s.gc_percent } ! bioruby> a ! ==> [30, 35, 40, 40, 35, 35, 35, 30, 25, 30, 30, 30, 35, 35, 35, 35, 35, 40, 45, 45, 45, 45, 40, 35, 40, 40, 40, 40, 40, 35, 35, 35, 30, 30, 30] ! Since the class of each subsequence is the same as original sequence *************** *** 185,191 **** * Shows translation results for 15 bases shifting a codon at a time ! seq.window_search(15, 3) do |s| ! puts s.translate ! end Finally, the window_search method returns the last leftover --- 199,209 ---- * Shows translation results for 15 bases shifting a codon at a time ! bioruby> a = [] ! bioruby> seq.window_search(15, 3) do |s| ! bioruby> a.push s.translate ! bioruby> end ! bioruby> a ! ==> ["MHAIK", "HAIKL", "AIKLI", "IKLIP", "KLIPI", "LIPIR", "IPIRS", "PIRSS", "IRSSR", "RSSRS", "SSRSS", "SRSSK", "RSSKK", "SSKKK"] ! Finally, the window_search method returns the last leftover *************** *** 193,206 **** * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences. The 1000bp at the start and end of ! each subsequence overlapped. At the 3' end of the sequence the ! leftover subsequence shorter than 10000bp is also added i = 1 remainder = seq.window_search(10000, 9000) do |s| ! puts s.to_fasta("segment #{i}", 60) i += 1 end ! puts remainder.to_fasta("segment #{i}", 60) If you don't want the overlapping window, set window size and stepping --- 211,227 ---- * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 + textwidth=60 remainder = seq.window_search(10000, 9000) do |s| ! puts s.to_fasta("segment #{i}", textwidth) i += 1 end ! if remainder ! puts remainder.to_fasta("segment #{i}", textwidth) ! end If you don't want the overlapping window, set window size and stepping *************** *** 211,224 **** * Count the codon usage ! codon_usage = Hash.new(0) ! seq.window_search(3, 3) do |s| ! codon_usage[s] += 1 ! end * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) ! seq.window_search(10, 10) do |s| ! puts s.molecular_weight ! end In most cases, sequences are read from files or retrieved from databases. --- 232,251 ---- * Count the codon usage ! bioruby> codon_usage = Hash.new(0) ! bioruby> seq.window_search(3, 3) do |s| ! bioruby> codon_usage[s] += 1 ! bioruby> end ! bioruby> codon_usage ! ==> {"cat"=>1, "aaa"=>3, "cca"=>1, "att"=>2, "aga"=>1, "atc"=>1, "cta"=>1, "gca"=>1, "cga"=>1, "tca"=>3, "aag"=>1, "tcc"=>1, "atg"=>1} ! * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) ! bioruby> a = [] ! bioruby> seq.window_search(10, 10) do |s| ! bioruby> a.push s.molecular_weight ! bioruby> end ! bioruby> a ! ==> [3096.2062, 3086.1962, 3056.1762, 3023.1262, 3073.2262] In most cases, sequences are read from files or retrieved from databases. *************** *** 246,249 **** --- 273,280 ---- % ruby na2aa.rb my_naseq.txt + or use a pipe! + + % cat my_naseq.txt|ruby na2aa.rb + Outputs *************** *** 254,259 **** % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt ! In the next section we will retrieve data from databases instead of ! using raw sequence files. == Parsing GenBank data (Bio::GenBank class) --- 285,291 ---- % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt ! In the next section we will retrieve data from databases instead of using raw ! sequence files. One generic example of the above can be found in ! ./sample/na2aa.rb. == Parsing GenBank data (Bio::GenBank class) *************** *** 460,474 **** Array and BioPerl's Bio::SimpleAlign. A very simple example is: ! require 'bio' ! ! seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] ! seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } ! # creates alignment object ! a = Bio::Alignment.new(seqs) ! ! # shows consensus sequence ! p a.consensus # ==> "a?gc?" ! # shows IUPAC consensus p a.consensus_iupac # ==> "ahgcr" --- 492,501 ---- Array and BioPerl's Bio::SimpleAlign. A very simple example is: ! bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ] ! bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) } # creates alignment object ! bioruby> a = Bio::Alignment.new(seqs) ! bioruby> a.consensus ! ==> "xa?gc?" # shows IUPAC consensus p a.consensus_iupac # ==> "ahgcr" *************** *** 1168,1179 **** == The BioRuby example programs ! Some sample programs are stored in samples/ directry. ! Some programs are obsolete. Since samples are not enough, ! practical and interesting samples are welcome. ! ! to be written... ! (EDITOR's NOTE: I would like some examples automatically ! included - with output) == Unit testing and doctests --- 1195,1201 ---- == The BioRuby example programs ! Some sample programs are stored in ./samples/ directory. Run for example: ! ./sample/na2aa.rb test/data/fasta/example1.txt == Unit testing and doctests *************** *** 1195,1198 **** --- 1217,1242 ---- (()). + == BioRuby Shell + + The BioRuby shell implementation you find in ./lib/bio/shell. It is very interesting + as it uses IRB (the Ruby intepreter) which is a powerful environment described in + (()). IRB commands can directly be typed in the shell, e.g. + + bioruby!> IRB.conf[:PROMPT_MODE] + ==!> :PROMPT_C + + optionally you also may want to install the optional Ruby readline support - + with Debian libreadline-ruby. To edit a previous line you may have to press + line down (arrow down) first. + + = Helpful tools + + Apart from rdoc you may also want to use rtags - which allows jumping around + source code by clicking on class and method names. + + cd bioruby/lib + rtags -R --vi + + For a tutorial see (()) = APPENDIX *************** *** 1227,1230 **** --- 1271,1283 ---- carefully that come with each package. + == Trouble shooting + + * Error: in `require': no such file to load -- bio (LoadError) + + Ruby fails to find the BioRuby libraries - add it to the RUBYLIB path, or pass + it to the interpeter. For example: + + ruby -I~/cvs/bioruby/lib yourprogram.rb + == Modifying this page From pjotr at dev.open-bio.org Mon Feb 11 08:03:36 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Mon, 11 Feb 2008 08:03:36 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.19,1.20 Message-ID: <200802110803.m1B83TYu007417@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv7397 Modified Files: Tutorial.rd Log Message: Minor adjustments to Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.19 retrieving revision 1.20 diff -C2 -d -r1.19 -r1.20 *** Tutorial.rd 11 Feb 2008 07:08:46 -0000 1.19 --- Tutorial.rd 11 Feb 2008 08:03:27 -0000 1.20 *************** *** 497,519 **** bioruby> a = Bio::Alignment.new(seqs) bioruby> a.consensus ! ==> "xa?gc?" # shows IUPAC consensus ! p a.consensus_iupac # ==> "ahgcr" ! # iterates over each seq a.each { |x| p x } ! # ==> ! # "atgca" ! # "aagca" ! # "acgca" ! # "acgcg" # iterates over each site a.each_site { |x| p x } ! # ==> ! # ["a", "a", "a", "a"] ! # ["t", "a", "c", "c"] ! # ["g", "g", "g", "g"] ! # ["c", "c", "c", "c"] ! # ["a", "a", "a", "g"] # doing alignment by using CLUSTAL W. --- 497,519 ---- bioruby> a = Bio::Alignment.new(seqs) bioruby> a.consensus ! ==> "a?gc?" # shows IUPAC consensus ! a.consensus_iupac ! ==> "ahgcr" # iterates over each seq a.each { |x| p x } ! # ==> ! # "atgca" ! # "aagca" ! # "acgca" ! # "acgcg" # iterates over each site a.each_site { |x| p x } ! # ==> ! # ["a", "a", "a", "a"] ! # ["t", "a", "c", "c"] ! # ["g", "g", "g", "g"] ! # ["c", "c", "c", "c"] ! # ["a", "a", "a", "g"] # doing alignment by using CLUSTAL W. From pjotr at dev.open-bio.org Wed Feb 13 08:04:41 2008 From: pjotr at dev.open-bio.org (Pjotr Prins) Date: Wed, 13 Feb 2008 08:04:41 -0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.20,1.21 Message-ID: <200802130804.m1D84XQC015600@dev.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory dev.open-bio.org:/tmp/cvs-serv15580 Modified Files: Tutorial.rd Log Message: Tutorial Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** Tutorial.rd 11 Feb 2008 08:03:27 -0000 1.20 --- Tutorial.rd 13 Feb 2008 08:04:30 -0000 1.21 *************** *** 183,187 **** through a variable named +s+. ! * Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") --- 183,187 ---- through a variable named +s+. ! Show average percentage of GC content for 20 bases (stepping the default one base at a time) bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa") *************** *** 197,201 **** use all methods on the subsequence. For example, ! * Shows translation results for 15 bases shifting a codon at a time bioruby> a = [] --- 197,201 ---- use all methods on the subsequence. For example, ! Shows translation results for 15 bases shifting a codon at a time bioruby> a = [] *************** *** 210,217 **** subsequence. This allows for example ! * Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 --- 210,217 ---- subsequence. This allows for example ! Divide a genome sequence into sections of 10000bp and ! output FASTA formatted sequences (line width 60 chars). The 1000bp at the ! start and end of each subsequence overlapped. At the 3' end of the sequence ! the leftover is also added: i = 1 *************** *** 230,234 **** Other examples ! * Count the codon usage bioruby> codon_usage = Hash.new(0) --- 230,234 ---- Other examples ! Count the codon usage bioruby> codon_usage = Hash.new(0) *************** *** 240,244 **** ! * Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) bioruby> a = [] --- 240,244 ---- ! Calculate molecular weight for each 10-aa peptide (or 10-nt nucleic acid) bioruby> a = [] *************** *** 399,408 **** end ! * Note: In this example Feature#assoc method makes a Hash from a ! feature object. It is useful because you can get data from the hash ! by using qualifiers as keys. ! (But there is a risk some information is lost when two or more ! qualifiers are the same. Therefore an Array is returned by ! Feature#feature) Bio::Sequence#splicing splices subsequence from nucleic acid sequence --- 399,408 ---- end ! Note: In this example Feature#assoc method makes a Hash from a ! feature object. It is useful because you can get data from the hash ! by using qualifiers as keys. ! (But there is a risk some information is lost when two or more ! qualifiers are the same. Therefore an Array is returned by ! Feature#feature) Bio::Sequence#splicing splices subsequence from nucleic acid sequence *************** *** 418,426 **** bio/location.rb. ! * Splice according to location string used in a GenBank entry naseq.splicing('join(2035..2050,complement(1775..1818),13..345') ! * Generate Bio::Locations object and pass the splicing method locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') --- 418,426 ---- bio/location.rb. ! Splice according to location string used in a GenBank entry naseq.splicing('join(2035..2050,complement(1775..1818),13..345') ! Generate Bio::Locations object and pass the splicing method locs = Bio::Locations.new('join((8298.8300)..10206,1..855)') *************** *** 430,434 **** (Bio::Sequence::AA objects). ! * Splicing peptide from a protein (e.g. signal peptide) aaseq.splicing('21..119') --- 430,434 ---- (Bio::Sequence::AA objects). ! Splicing peptide from a protein (e.g. signal peptide) aaseq.splicing('21..119')