From ngoto at pub.open-bio.org Wed Jan 4 08:01:11 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Wed Jan 4 07:52:13 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb, 1.6, 1.7 residue.rb, 1.4, 1.5 Message-ID: <200601041301.k04D1BVL011971@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv11954 Modified Files: pdb.rb residue.rb Log Message: * created new class to store HETATM: Bio::PDB::HeteroCompound < Bio::PDB::Residue * added Bio::PDB::Residue.get_residue_id_from_atom(atom). * Bio::PDB::Residue#id is now an alias of Bio::PDB::Residue#residue_id. Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** residue.rb 18 Dec 2005 17:34:47 -0000 1.4 --- residue.rb 4 Jan 2006 13:01:09 -0000 1.5 *************** *** 33,42 **** include Enumerable include Comparable ! ! attr_reader :resName, :resSeq, :iCode, :id, :chain, :hetatm ! attr_writer :resName, :chain, :hetatm ! def initialize(resName = nil, resSeq = nil, iCode = nil, ! chain = nil, hetatm = false) @resName = resName --- 33,45 ---- include Enumerable include Comparable ! ! # Creates residue id from an ATOM (or HETATM) object. ! def self.get_residue_id_from_atom(atom) ! "#{atom.resSeq}#{atom.iCode.strip}".strip ! end ! ! # Creates a new Residue object. def initialize(resName = nil, resSeq = nil, iCode = nil, ! chain = nil) @resName = resName *************** *** 44,90 **** @iCode = iCode - @hetatm = hetatm - - #Residue id is required because resSeq doesn't uniquely identify - #a residue. ID is constructed from resSeq and iCode and is appended - #to 'LIGAND' if the residue is a HETATM - if (!@resSeq and !@iCode) - @id = nil - else - @id = "#{@resSeq}#{@iCode.strip}" - if @hetatm - @id = 'LIGAND' + @id - end - end - @chain = chain ! ! @atoms = Array.new ! end ! #Keyed access to atoms based on element e.g. ["CA"] def [](key) atom = @atoms.find{ |atom| key == atom.element } end ! ! #Need to define these to make sure id is correctly updated def resSeq=(resSeq) @resSeq = resSeq.to_i ! @id = "#{@resSeq}#{@iCode.strip}" ! if @hetatm ! @id = 'LIGAND' + @id ! end end ! def iCode=(iCode) @iCode = iCode ! @id = "#{@resSeq}#{@iCode.strip}" ! if @hetatm ! @id = 'LIGAND' + @id ! end end ! #Adds an atom to this residue def addAtom(atom) raise "Expecting ATOM or HETATM" unless atom.is_a? Bio::PDB::Record::ATOM --- 47,108 ---- @iCode = iCode @chain = chain ! @atoms = [] ! ! update_residue_id end ! ! # atoms in this residue. (Array) ! attr_reader :atoms ! ! # the chain to which this residue belongs ! attr_accessor :chain ! ! # resName (residue name) ! attr_accessor :resName ! ! # residue id (String or nil) ! attr_reader :residue_id ! ! # Now, Residue#id is an alias of residue_id. ! alias id residue_id ! #Keyed access to atoms based on element e.g. ["CA"] def [](key) atom = @atoms.find{ |atom| key == atom.element } end ! ! # Updates residue id. This is a private method. ! # Need to call this method to make sure id is correctly updated. ! def update_residue_id ! if !@resSeq and !@iCode ! @residue_id = nil ! else ! @residue_id = "#{@resSeq}#{@iCode.to_s.strip}".strip ! end ! end ! private :update_residue_id ! ! # resSeq ! attr_reader :resSeq ! ! # resSeq=() def resSeq=(resSeq) @resSeq = resSeq.to_i ! update_residue_id ! @resSeq end ! ! # iCode ! attr_reader :iCode ! ! # iCode=() def iCode=(iCode) @iCode = iCode ! update_residue_id ! @iCode end ! # Adds an atom to this residue def addAtom(atom) raise "Expecting ATOM or HETATM" unless atom.is_a? Bio::PDB::Record::ATOM *************** *** 93,97 **** end ! #Iterator over the atoms def each @atoms.each{ |atom| yield atom } --- 111,115 ---- end ! # Iterator over the atoms def each @atoms.each{ |atom| yield atom } *************** *** 100,104 **** alias each_atom each ! #Sorts based on resSeq and iCode if need be def <=>(other) if @resSeq != other.resSeq --- 118,122 ---- alias each_atom each ! # Sorts based on resSeq and iCode if need be def <=>(other) if @resSeq != other.resSeq *************** *** 109,113 **** end ! #Stringifies each atom def to_s string = "" --- 127,131 ---- end ! # Stringifies each atom def to_s string = "" *************** *** 115,122 **** return string end - - end ! end ! end --- 133,172 ---- return string end ! # If the residue is HETATM, returns true. ! # Otherwise, returns false. ! def hetatm ! false ! end ! end #class Residue ! class HeteroCompound < Residue ! ! # Creates residue id from an ATOM (or HETATM) object. ! # ! # We add 'LIGAND' to the id if it's a HETATM. ! # I think this is neccessary because some PDB files reuse ! # numbers for HETATMS. ! def self.get_residue_id_from_atom(atom) ! 'LIGAND' + super ! end ! ! # Residue id is required because resSeq doesn't uniquely identify ! # a residue. ID is constructed from resSeq and iCode and is appended ! # to 'LIGAND' if the residue is a HETATM ! def update_residue_id ! super ! @residue_id = 'LIGAND' + @residue_id if @residue_id ! end ! private :update_residue_id ! ! # If the residue is HETATM, returns true. ! # Otherwise, returns false. ! def hetatm ! true ! end ! end #class HeteroCompound ! ! end #class PDB ! ! end #module Bio Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** pdb.rb 18 Dec 2005 17:37:14 -0000 1.6 --- pdb.rb 4 Jan 2006 13:01:09 -0000 1.7 *************** *** 1212,1218 **** #Empty current model ! cModel = Bio::PDB::Model.new ! cChain = Bio::PDB::Chain.new ! cResidue = Bio::PDB::Residue.new #Goes through each line and replace that line with a PDB::Record --- 1212,1219 ---- #Empty current model ! cModel = Model.new ! cChain = Chain.new ! cResidue = Residue.new ! #cCompound = HeteroCompound.new #Goes through each line and replace that line with a PDB::Record *************** *** 1239,1243 **** case key when 'ATOM' ! residueID = "#{f.resSeq}#{f.iCode.strip}".strip #p f --- 1240,1244 ---- case key when 'ATOM' ! residueID = Residue.get_residue_id_from_atom(f) #p f *************** *** 1270,1275 **** #I can fix this if really needed if f.resName == 'HOH' ! solvent = Residue.new(f.resName, f.resSeq, f.iCode, ! cModel.solvent, true) #p solvent f.residue = solvent --- 1271,1276 ---- #I can fix this if really needed if f.resName == 'HOH' ! solvent = HeteroCompound.new(f.resName, f.resSeq, f.iCode, ! cModel.solvent) #p solvent f.residue = solvent *************** *** 1279,1287 **** else ! #Make residue we add 'LIGAND' to the id if it's a HETATM ! #I think this is neccessary because some PDB files reuse ! #numbers for HETATMS ! residueID = "#{f.resSeq}#{f.iCode.strip}".strip ! residueID = "LIGAND" + residueID #p f #p residueID --- 1280,1284 ---- else ! residueID = HeteroCompound.get_residue_id_from_atom(f) #p f #p residueID *************** *** 1300,1305 **** residue = cResidue elsif newChain or !(residue = chain[residueID]) ! newResidue = Residue.new(f.resName, f.resSeq, f.iCode, ! chain, true) chain.addLigand(newResidue) cResidue = newResidue --- 1297,1302 ---- residue = cResidue elsif newChain or !(residue = chain[residueID]) ! newResidue = HeteroCompound.new(f.resName, f.resSeq, f.iCode, ! chain) chain.addLigand(newResidue) cResidue = newResidue From ngoto at pub.open-bio.org Wed Jan 4 09:01:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Wed Jan 4 08:52:16 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb, 1.7, 1.8 residue.rb, 1.5, 1.6 Message-ID: <200601041401.k04E1GVL012128@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv12116 Modified Files: pdb.rb residue.rb Log Message: HeteroCompound is renamed to Heterogen. Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** residue.rb 4 Jan 2006 13:01:09 -0000 1.5 --- residue.rb 4 Jan 2006 14:01:14 -0000 1.6 *************** *** 141,145 **** end #class Residue ! class HeteroCompound < Residue # Creates residue id from an ATOM (or HETATM) object. --- 141,145 ---- end #class Residue ! class Heterogen < Residue # Creates residue id from an ATOM (or HETATM) object. *************** *** 166,170 **** true end ! end #class HeteroCompound end #class PDB --- 166,170 ---- true end ! end #class Heterogen end #class PDB Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** pdb.rb 4 Jan 2006 13:01:09 -0000 1.7 --- pdb.rb 4 Jan 2006 14:01:14 -0000 1.8 *************** *** 1215,1219 **** cChain = Chain.new cResidue = Residue.new ! #cCompound = HeteroCompound.new #Goes through each line and replace that line with a PDB::Record --- 1215,1219 ---- cChain = Chain.new cResidue = Residue.new ! #cCompound = Heterogen.new #Goes through each line and replace that line with a PDB::Record *************** *** 1271,1276 **** #I can fix this if really needed if f.resName == 'HOH' ! solvent = HeteroCompound.new(f.resName, f.resSeq, f.iCode, ! cModel.solvent) #p solvent f.residue = solvent --- 1271,1276 ---- #I can fix this if really needed if f.resName == 'HOH' ! solvent = Heterogen.new(f.resName, f.resSeq, f.iCode, ! cModel.solvent) #p solvent f.residue = solvent *************** *** 1280,1284 **** else ! residueID = HeteroCompound.get_residue_id_from_atom(f) #p f #p residueID --- 1280,1284 ---- else ! residueID = Heterogen.get_residue_id_from_atom(f) #p f #p residueID *************** *** 1297,1301 **** residue = cResidue elsif newChain or !(residue = chain[residueID]) ! newResidue = HeteroCompound.new(f.resName, f.resSeq, f.iCode, chain) chain.addLigand(newResidue) --- 1297,1301 ---- residue = cResidue elsif newChain or !(residue = chain[residueID]) ! newResidue = Heterogen.new(f.resName, f.resSeq, f.iCode, chain) chain.addLigand(newResidue) From ngoto at pub.open-bio.org Wed Jan 4 10:41:52 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Wed Jan 4 10:33:31 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb chain.rb, 1.2, 1.3 model.rb, 1.2, 1.3 pdb.rb, 1.8, 1.9 residue.rb, 1.6, 1.7 utils.rb, 1.2, 1.3 Message-ID: <200601041541.k04FfqVL012388@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv12356 Modified Files: chain.rb model.rb pdb.rb residue.rb utils.rb Log Message: * There are many changes. Some changes would not be listed below. * general * Now, heterogens are separately treated. * Adding "LIGAND" to the id of the heterogens are no longer available. * Waters (resName == "HOH") are now treated as normal heterogens. However, Model#solvents is still available. * utils.rb * Added new modules HetatmFinder and HeterogenFinder. * You can use #each_hetatm and #find_hetatm for Bio::PDB, Bio::PDB::Model, Bio::PDB::Chain, and Bio::PDB::Heterogens. * You can use #each_heterogen and #find_heterogen for Bio::PDB, Bio::PDB::Model, and Bio::PDB::Chain. * pdb.rb (Bio::PDB) * Added PDB#models. * model.rb (Bio::PDB::Model) * added Model#chains and Model#solvents. * fixed typo? in <=>. * chain.rb (Bio::PDB::Chain) * Now, Chain#id is an alias of Chain#chain_id. * added Chain#residues and Chain#heterogens. * residue.rb (Bio::PDB::Residue and Bio::PDB::Heterogen) * added Residue#atoms and Heterogen#hetatms. * added Heterogen#each_hetatm. Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** residue.rb 4 Jan 2006 14:01:14 -0000 1.6 --- residue.rb 4 Jan 2006 15:41:50 -0000 1.7 *************** *** 31,34 **** --- 31,35 ---- include Utils include AtomFinder + include Enumerable include Comparable *************** *** 115,119 **** @atoms.each{ |atom| yield atom } end ! #Alias to override AtomFinder#each_atom alias each_atom each --- 116,120 ---- @atoms.each{ |atom| yield atom } end ! # Alias to override AtomFinder#each_atom alias each_atom each *************** *** 143,163 **** class Heterogen < Residue ! # Creates residue id from an ATOM (or HETATM) object. ! # ! # We add 'LIGAND' to the id if it's a HETATM. ! # I think this is neccessary because some PDB files reuse ! # numbers for HETATMS. ! def self.get_residue_id_from_atom(atom) ! 'LIGAND' + super ! end ! ! # Residue id is required because resSeq doesn't uniquely identify ! # a residue. ID is constructed from resSeq and iCode and is appended ! # to 'LIGAND' if the residue is a HETATM ! def update_residue_id ! super ! @residue_id = 'LIGAND' + @residue_id if @residue_id ! end ! private :update_residue_id # If the residue is HETATM, returns true. --- 144,148 ---- class Heterogen < Residue ! include HetatmFinder # If the residue is HETATM, returns true. *************** *** 166,169 **** --- 151,159 ---- true end + + # Alias to override HetatmFinder#each_hetatm + alias each_hetatm each + + alias hetatms atoms end #class Heterogen Index: utils.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/utils.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** utils.rb 8 Sep 2005 01:22:11 -0000 1.2 --- utils.rb 4 Jan 2006 15:41:50 -0000 1.3 *************** *** 231,234 **** --- 231,260 ---- end + module HetatmFinder + def find_hetatm() + array = [] + self.each_hetatm do |hetatm| + array.push(hetatm) if yield(hetatm) + end + return array + end + def each_hetatm(&x) #:yields: hetatm + self.each_heterogen { |heterogen| heterogen.each(&x) } + end + end + + module HeterogenFinder + def find_heterogen() + array = [] + self.each_heterogen do |heterogen| + array.push(heterogen) if yield(heterogen) + end + return array + end + def each_heterogen(&x) #:yields: heterogen + self.each_chain { |chain| chain.each_heterogen(&x) } + end + end + end; end #module Bio; class PDB Index: model.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/model.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** model.rb 26 Sep 2005 13:00:08 -0000 1.2 --- model.rb 4 Jan 2006 15:41:50 -0000 1.3 *************** *** 33,42 **** include ResidueFinder include ChainFinder include Enumerable include Comparable ! attr_reader :model_serial, :structure, :solvent ! attr_writer :model_serial ! def initialize(model_serial = nil, structure = nil) --- 33,47 ---- include ResidueFinder include ChainFinder + + include HetatmFinder + include HeterogenFinder + include Enumerable include Comparable ! attr_accessor :model_serial ! attr_reader :structure ! attr_reader :solvents ! def initialize(model_serial = nil, structure = nil) *************** *** 46,52 **** @chains = Array.new ! @solvent = Chain.new('',self) end #Adds a chain --- 51,60 ---- @chains = Array.new ! @solvents = Chain.new('', self) end + + attr_reader :chains + attr_reader :solvents #Adds a chain *************** *** 60,73 **** def addSolvent(solvent) raise "Expecting a Bio::PDB::Residue" if not solvent.is_a? Bio::PDB::Residue ! @solvent.addResidue(solvent) end def removeSolvent ! @solvent = nil end #Chain iterator ! def each ! @chains.each{ |chain| yield chain } end #Alias to override ChainFinder#each_chain --- 68,81 ---- def addSolvent(solvent) raise "Expecting a Bio::PDB::Residue" if not solvent.is_a? Bio::PDB::Residue ! @solvents.addResidue(solvent) end def removeSolvent ! @solvents = nil end #Chain iterator ! def each(&x) #:yields: chain ! @chains.each(&x) end #Alias to override ChainFinder#each_chain *************** *** 76,80 **** #Sorts models based on serial number def <=>(other) ! return @mode_serial <=> other.model_serial end --- 84,88 ---- #Sorts models based on serial number def <=>(other) ! return @model_serial <=> other.model_serial end Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** pdb.rb 4 Jan 2006 14:01:14 -0000 1.8 --- pdb.rb 4 Jan 2006 15:41:50 -0000 1.9 *************** *** 40,43 **** --- 40,47 ---- include ChainFinder include ModelFinder + + include HetatmFinder + include HeterogenFinder + include Enumerable *************** *** 1213,1219 **** #Empty current model cModel = Model.new ! cChain = Chain.new ! cResidue = Residue.new ! #cCompound = Heterogen.new #Goes through each line and replace that line with a PDB::Record --- 1217,1223 ---- #Empty current model cModel = Model.new ! cChain = nil #Chain.new ! cResidue = nil #Residue.new ! cLigand = nil #Heterogen.new #Goes through each line and replace that line with a PDB::Record *************** *** 1238,1312 **** # Do something for ATOM and HETATM case key when 'ATOM' residueID = Residue.get_residue_id_from_atom(f) - #p f ! if f.chainID == cChain.id ! chain = cChain ! elsif !(chain = cModel[f.chainID]) ! #If we don't have chain, add a new chain ! newChain = Chain.new(f.chainID, cModel) ! cModel.addChain(newChain) ! cChain = newChain ! chain = newChain ! end ! ! if !newChain and residueID == cResidue.id residue = cResidue ! elsif newChain or !(residue = chain[residueID]) ! newResidue = Residue.new(f.resName, f.resSeq, f.iCode, chain) ! chain.addResidue(newResidue) ! cResidue = newResidue ! residue = newResidue end ! f.residue = residue residue.addAtom(f) when 'HETATM' ! #Each model has a special solvent chain ! #any chain id with the solvent is lost ! #I can fix this if really needed ! if f.resName == 'HOH' ! solvent = Heterogen.new(f.resName, f.resSeq, f.iCode, ! cModel.solvent) ! #p solvent ! f.residue = solvent ! solvent.addAtom(f) ! cModel.addSolvent(solvent) ! else ! ! residueID = Heterogen.get_residue_id_from_atom(f) ! #p f ! #p residueID ! ! if f.chainID == cChain.id ! chain = cChain ! elsif !(chain = cModel[f.chainID]) ! #If we don't have chain, add a new chain ! newChain = Chain.new(f.chainID, cModel) ! cModel.addChain(newChain) ! cChain = newChain ! chain = newChain ! end ! ! if !newChain and residueID == cResidue.id ! residue = cResidue ! elsif newChain or !(residue = chain[residueID]) ! newResidue = Heterogen.new(f.resName, f.resSeq, f.iCode, ! chain) ! chain.addLigand(newResidue) ! cResidue = newResidue ! residue = newResidue end - - f.residue = residue - residue.addAtom(f) - end when 'MODEL' if cModel.model_serial --- 1242,1306 ---- # Do something for ATOM and HETATM + if key == 'ATOM' or key == 'HETATM' then + if cChain and f.chainID == cChain.id + chain = cChain + else + if chain = cModel[f.chainID] + cChain = chain unless cChain + else + # If we don't have chain, add a new chain + newChain = Chain.new(f.chainID, cModel) + cModel.addChain(newChain) + cChain = newChain + chain = newChain + end + end + end + case key when 'ATOM' residueID = Residue.get_residue_id_from_atom(f) ! if cResidue and residueID == cResidue.id residue = cResidue ! else ! if residue = chain.get_residue_by_id(residueID) ! cResidue = residue unless cResidue ! else ! # add a new residue ! newResidue = Residue.new(f.resName, f.resSeq, f.iCode, chain) ! chain.addResidue(newResidue) ! cResidue = newResidue ! residue = newResidue ! end end ! f.residue = residue residue.addAtom(f) when 'HETATM' + residueID = Heterogen.get_residue_id_from_atom(f) ! if cLigand and residueID == cLigand.id ! ligand = cLigand else ! if ligand = chain.get_heterogen_by_id(residueID) ! cLigand = ligand unless cLigand ! else ! # add a new heterogen ! newLigand = Heterogen.new(f.resName, f.resSeq, f.iCode, chain) ! chain.addLigand(newLigand) ! cLigand = newLigand ! ligand = newLigand ! #Each model has a special solvent chain. (for compatibility) ! if f.resName == 'HOH' ! cModel.addSolvent(newLigand) ! end end end + f.residue = ligand + ligand.addAtom(f) + when 'MODEL' if cModel.model_serial *************** *** 1323,1327 **** end #def initialize ! attr_reader :data, :hash #Adds a Bio::Model to the current strucutre --- 1317,1324 ---- end #def initialize ! attr_reader :data ! attr_reader :hash ! ! attr_reader :models #Adds a Bio::Model to the current strucutre Index: chain.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/chain.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** chain.rb 26 Sep 2005 13:00:08 -0000 1.2 --- chain.rb 4 Jan 2006 15:41:50 -0000 1.3 *************** *** 31,102 **** include AtomFinder include ResidueFinder include Enumerable include Comparable - attr_reader :id, :model - attr_writer :id - def initialize(id = nil, model = nil) ! @id = id @model = model ! @residues = Array.new ! @ligands = Array.new ! end ! #Keyed access to residues based on ids def [](key) ! #If you want to find HETATMS you need to add LIGAND to the id ! if key.to_s[0,6] == 'LIGAND' ! residue = @ligands.find{ |residue| key.to_s == residue.id } ! else ! residue = @residues.find{ |residue| key.to_s == residue.id } ! end end #Add a residue to this chain def addResidue(residue) ! raise "Expecting a Bio::PDB::Residue" if not residue.is_a? Bio::PDB::Residue @residues.push(residue) self end ! #Add a ligand to this chain ! def addLigand(residue) ! raise "Expecting a Bio::PDB::Residue" if not residue.is_a? Bio::PDB::Residue ! @ligands.push(residue) self end ! #Residue iterator ! def each ! @residues.each{ |residue| yield residue } end #Alias to override ResidueFinder#each_residue alias each_residue each ! #Sort based on chain id def <=>(other) ! return @id <=> other.id end ! #Stringifies each residue def to_s ! string = "" ! @residues.each{ |residue| string << residue.to_s } ! string = string << "TER\n" ! return string end def atom_seq string = "" last_residue_num = nil ! @residues.each{ |residue| if last_residue_num and ! (residue.resSeq.to_i - last_residue_num).abs > 1 ! (residue.resSeq.to_i - last_residue_num).abs.times{ string << 'X' } end tlc = residue.resName.capitalize --- 31,124 ---- include AtomFinder include ResidueFinder + + include HetatmFinder + include HeterogenFinder + include Enumerable include Comparable def initialize(id = nil, model = nil) ! @chain_id = id @model = model ! @residues = [] ! @heterogens = [] end + + attr_accessor :chain_id + attr_reader :model + + alias id chain_id + + # residues in this chain + attr_reader :residues + + # heterogens in this chain + attr_reader :heterogens ! # get the residue by id ! def get_residue_by_id(key) ! @residues.find { |r| r.residue_id == key } ! end ! ! # get the residue by id. ! # Compatibility Note: now, you cannot find HETATMS in this method. ! # To add LIGAND to the id is no longer available. ! # To get heterogens, you must use get_heterogen_by_id. def [](key) ! get_residue_by_id(key) ! end ! ! # get the heterogen (ligand) by id ! def get_heterogen_by_id(key) ! @heterogens.find { |r| r.residue_id == key } end #Add a residue to this chain def addResidue(residue) ! raise "Expecting a Bio::PDB::Residue" unless residue.is_a? Bio::PDB::Residue @residues.push(residue) self end ! #Add a heterogen (ligand) to this chain ! def addLigand(ligand) ! raise "Expecting a Bio::PDB::Residue" unless ligand.is_a? Bio::PDB::Residue ! @heterogens.push(ligand) self end ! # Iterates over each residue ! def each(&x) #:yields: residue ! @residues.each(&x) end #Alias to override ResidueFinder#each_residue alias each_residue each + + # Iterates over each hetero-compound + def each_heterogen(&x) #:yields: heterogen + @heterogens.each(&x) + end ! # Operator aimed to sort based on chain id def <=>(other) ! return @chain_id <=> other.chain_id end ! # Stringifies each residue def to_s ! @residues.join('') + "TER\n" end + # gets an amino acid sequence of the chain def atom_seq string = "" last_residue_num = nil ! @residues.each do |residue| if last_residue_num and ! (x = (residue.resSeq.to_i - last_residue_num).abs) > 1 then ! x.times { string << 'X' } end tlc = residue.resName.capitalize *************** *** 106,117 **** end string << olc ! } Bio::Sequence::AA.new(string) - end ! end ! end ! end --- 128,138 ---- end string << olc ! end Bio::Sequence::AA.new(string) end ! end #class Chain ! end #class PDB ! end #module Bio From ngoto at pub.open-bio.org Thu Jan 5 04:24:56 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu Jan 5 04:15:52 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb, 1.9, 1.10 model.rb, 1.3, 1.4 Message-ID: <200601050924.k059OuVL015387@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv15336 Modified Files: pdb.rb model.rb Log Message: * pdb.rb * fixed failure to get model serial number * added Bio::PDB::ATOM#ter, #sigatm, #anisou, Bio::PDB::ANISOU#siguij to get a corresponding recorord. * model.rb * Now, model_serial is an alias of serial. * added RDoc. Index: model.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/model.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** model.rb 4 Jan 2006 15:41:50 -0000 1.3 --- model.rb 5 Jan 2006 09:24:54 -0000 1.4 *************** *** 26,30 **** class PDB ! #Model class class Model --- 26,30 ---- class PDB ! # Model class class Model *************** *** 40,96 **** include Comparable ! attr_accessor :model_serial ! attr_reader :structure ! attr_reader :solvents ! ! def initialize(model_serial = nil, structure = nil) ! ! @model_serial = model_serial ! ! @structure = structure ! ! @chains = Array.new ! @solvents = Chain.new('', self) end attr_reader :chains attr_reader :solvents ! ! #Adds a chain def addChain(chain) ! raise "Expecting a Bio::PDB::Chain" if not chain.is_a? Bio::PDB::Chain @chains.push(chain) self end ! #adds a solvent molecule def addSolvent(solvent) ! raise "Expecting a Bio::PDB::Residue" if not solvent.is_a? Bio::PDB::Residue @solvents.addResidue(solvent) end def removeSolvent @solvents = nil end ! #Chain iterator def each(&x) #:yields: chain @chains.each(&x) end ! #Alias to override ChainFinder#each_chain alias each_chain each ! #Sorts models based on serial number def <=>(other) ! return @model_serial <=> other.model_serial end ! #Keyed access to chains def [](key) chain = @chains.find{ |chain| key == chain.id } end ! #stringifies to chains def to_s string = "" --- 40,103 ---- include Comparable ! # Creates a new Model object ! def initialize(serial = nil, structure = nil) + @serial = serial + @structure = structure + @chains = [] + @solvents = Chain.new('', self) end + # chains in this model attr_reader :chains + + # (OBSOLETE) solvents in this model attr_reader :solvents ! ! # serial number of this model. (Integer or nil) ! attr_accessor :serial ! ! # for backward compatibility ! alias model_serial serial ! ! # (deprecated) ! attr_reader :structure ! ! # Adds a chain to this model def addChain(chain) ! raise "Expecting a Bio::PDB::Chain" unless chain.is_a? Bio::PDB::Chain @chains.push(chain) self end ! # (OBSOLETE) Adds a solvent molecule to this model def addSolvent(solvent) ! raise "Expecting a Bio::PDB::Residue" unless solvent.is_a? Bio::PDB::Residue @solvents.addResidue(solvent) end + # (OBSOLETE) not recommended to use this method def removeSolvent @solvents = nil end ! # Iterates over each chain def each(&x) #:yields: chain @chains.each(&x) end ! # Alias to override ChainFinder#each_chain alias each_chain each ! # Operator aimed to sort models based on serial number def <=>(other) ! return @serial <=> other.model_serial end ! # Keyed access to chains def [](key) chain = @chains.find{ |chain| key == chain.id } end ! # stringifies to chains def to_s string = "" *************** *** 99,105 **** end @chains.each{ |chain| string << chain.to_s } ! if solvent ! string << @solvent.to_s ! end if model_serial string << "ENDMDL" --- 106,112 ---- end @chains.each{ |chain| string << chain.to_s } ! #if solvent ! # string << @solvent.to_s ! #end if model_serial string << "ENDMDL" *************** *** 110,114 **** end #class Model ! end ! end --- 117,121 ---- end #class Model ! end #class PDB ! end #module Bio Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** pdb.rb 4 Jan 2006 15:41:50 -0000 1.9 --- pdb.rb 5 Jan 2006 09:24:54 -0000 1.10 *************** *** 840,843 **** --- 840,852 ---- attr_accessor :residue + # SIGATM record + attr_accessor :sigatm + + # ANISOU record + attr_accessor :anisou + + # TER record + attr_accessor :ter + #Returns a Coordinate class instance of the xyz positions def xyz *************** *** 914,917 **** --- 923,931 ---- ) + class ANISOU + # SIGUIJ record + attr_accessor :siguij + end #class ANISOU + SIGUIJ = def_rec([ 7, 11, Pdb_Integer, :serial ], *************** *** 1220,1223 **** --- 1234,1238 ---- cResidue = nil #Residue.new cLigand = nil #Heterogen.new + c_atom = nil #Goes through each line and replace that line with a PDB::Record *************** *** 1260,1263 **** --- 1275,1279 ---- case key when 'ATOM' + c_atom = f residueID = Residue.get_residue_id_from_atom(f) *************** *** 1280,1283 **** --- 1296,1300 ---- when 'HETATM' + c_atom = f residueID = Heterogen.get_residue_id_from_atom(f) *************** *** 1304,1312 **** when 'MODEL' ! if cModel.model_serial self.addModel(cModel) end ! model_serial = line[6,5] ! cModel = Model.new(model_serial) end f --- 1321,1361 ---- when 'MODEL' ! c_atom = nil ! if cModel.model_serial or cModel.chains.size > 0 then self.addModel(cModel) end ! cModel = Model.new(f.serial) ! ! when 'TER' ! if c_atom ! c_atom.ter = f ! else ! #$stderr.puts "Warning: stray TER?" ! end ! when 'SIGATM' ! if c_atom ! #$stderr.puts "Warning: duplicated SIGATM?" if c_atom.sigatm ! c_atom.sigatm = f ! else ! #$stderr.puts "Warning: stray SIGATM?" ! end ! when 'ANISOU' ! if c_atom ! #$stderr.puts "Warning: duplicated ANISOU?" if c_atom.anisou ! c_atom.anisou = f ! else ! #$stderr.puts "Warning: stray ANISOU?" ! end ! when 'SIGUIJ' ! if c_atom and c_atom.anisou ! #$stderr.puts "Warning: duplicated SIGUIJ?" if c_atom.anisou.siguij ! c_atom.anisou.siguij = f ! else ! #$stderr.puts "Warning: stray SIGUIJ?" ! end ! ! else ! c_atom = nil ! end f From ngoto at pub.open-bio.org Thu Jan 5 06:10:12 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu Jan 5 06:01:09 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb, 1.10, 1.11 utils.rb, 1.3, 1.4 Message-ID: <200601051110.k05BACVL015635@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv15618 Modified Files: pdb.rb utils.rb Log Message: * utils.rb * Changed Bio::PDB::Utils.to_xyz(obj) to convert_to_xyz(obj) (old to_xyz is still available for compatibility). * In Utils, distance, dihedral_angle, rad2reg, acos, calculatePlane are now module_function. * added RDoc. * added ChainFinder#chains, ResidueFinder#residues, AtomFinder#atoms, HetatmFinder#hetatms, and HeterogenFinder#heterogens. * pdb.rb * modified some documents. Index: utils.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/utils.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** utils.rb 4 Jan 2006 15:41:50 -0000 1.3 --- utils.rb 5 Jan 2006 11:10:10 -0000 1.4 *************** *** 26,51 **** module Bio; class PDB module Utils - #The methods in this mixin should be applicalbe to all PDB objects ! #Returns the coordinates of the geometric centre (average co-ord) ! #of any AtomFinder (or .atoms) implementing object ! def geometricCentre() ! x = y = z = count = 0 ! self.each_atom{ |atom| x += atom.x y += atom.y z += atom.z count += 1 ! } ! ! x = x / count ! y = y / count ! z = z / count Coordinate[x,y,z] - end --- 26,54 ---- module Bio; class PDB + # Utility methods for PDB data. + # + # The methods in this mixin should be applicalbe to all PDB objects. module Utils ! # Returns the coordinates of the geometric centre (average co-ord) ! # of any AtomFinder (or .atoms) implementing object ! # ! # If you want to get the geometric centre of hetatms, ! # call geometricCentre(:each_hetatm). ! def geometricCentre(method = :each_atom) x = y = z = count = 0 ! self.__send__(method) do |atom| x += atom.x y += atom.y z += atom.z count += 1 ! end + x = (x / count) + y = (y / count) + z = (z / count) + Coordinate[x,y,z] end *************** *** 64,69 **** } def centreOfGravity() - x = y = z = total = 0 --- 67,72 ---- } + # calculates centre of gravitiy def centreOfGravity() x = y = z = total = 0 *************** *** 82,98 **** Coordinate[x,y,z] - end #Perhaps distance and dihedral would be better off as class methods? #(rather) than instance methods ! def self.distance(coord1,coord2) ! coord1 = to_xyz(coord1) ! coord2 = to_xyz(coord2) (coord1 - coord2).r end ! def self.dihedral_angle(coord1,coord2,coord3,coord4) ! (a1,b1,c1,d) = calculatePlane(coord1,coord2,coord3) (a2,b2,c2) = calculatePlane(coord2,coord3,coord4) --- 85,105 ---- Coordinate[x,y,z] end + #-- #Perhaps distance and dihedral would be better off as class methods? #(rather) than instance methods ! #++ ! ! # Calculates distance between _coord1_ and _coord2_. ! def distance(coord1, coord2) ! coord1 = convert_to_xyz(coord1) ! coord2 = convert_to_xyz(coord2) (coord1 - coord2).r end + module_function :distance ! # Calculates dihedral angle. ! def dihedral_angle(coord1, coord2, coord3, coord4) (a1,b1,c1,d) = calculatePlane(coord1,coord2,coord3) (a2,b2,c2) = calculatePlane(coord2,coord3,coord4) *************** *** 106,112 **** end end ! #Implicit conversion into Vector or Bio::PDB::Coordinate ! def self.to_xyz(obj) unless obj.is_a?(Vector) begin --- 113,120 ---- end end + module_function :dihedral_angle ! # Implicit conversion into Vector or Bio::PDB::Coordinate ! def convert_to_xyz(obj) unless obj.is_a?(Vector) begin *************** *** 118,133 **** obj end #Methods required for the dihedral angle calculations #perhaps these should go in some separate Math module ! def self.rad2deg(r) (r/Math::PI)*180 end ! ! def self.acos(x) Math.atan2(Math.sqrt(1 - x**2),x) end ! ! def self.calculatePlane(coord1,coord2,coord3) a = coord1.y * (coord2.z - coord3.z) + coord2.y * (coord3.z - coord1.z) + --- 126,155 ---- obj end + module_function :convert_to_xyz + + # (Deprecated) alias of convert_to_xyz(obj) + def self.to_xyz(obj) + convert_to_xyz(obj) + end + #-- #Methods required for the dihedral angle calculations #perhaps these should go in some separate Math module ! #++ ! ! # radian to degree ! def rad2deg(r) (r/Math::PI)*180 end ! module_function :rad2deg ! ! # acos ! def acos(x) Math.atan2(Math.sqrt(1 - x**2),x) end ! module_function :acos ! ! # calculates plane ! def calculatePlane(coord1, coord2, coord3) a = coord1.y * (coord2.z - coord3.z) + coord2.y * (coord3.z - coord1.z) + *************** *** 147,157 **** return [a,b,c,d] - end ! #Every class in the heirarchy implements finder, this takes ! #a class which determines which type of object to find, the associated ! #block is then run in classic .find style ! def finder(findtype,&block) if findtype == Bio::PDB::Atom return self.find_atom(&block) --- 169,182 ---- return [a,b,c,d] end + module_function :calculatePlane ! # Every class in the heirarchy implements finder, this takes ! # a class which determines which type of object to find, the associated ! # block is then run in classic .find style. ! # ! # The method might be deprecated. ! # You'd better using find_XXX directly. ! def finder(findtype, &block) #:yields: obj if findtype == Bio::PDB::Atom return self.find_atom(&block) *************** *** 167,236 **** end end #module Utils ! #The *Finder modules implement a find_* method which returns #an array of anything for which the block evals true #(suppose Enumerable#find_all method). #The each_* style methods act as classic iterators. module ModelFinder ! def find_model() array = [] ! self.each_model{ |model| array.push(model) if yield(model) ! } return array end ! end #The heirarchical nature of the objects allow us to re-use the #methods from the previous level - e.g. A PDB object can use the .models #method defined in ModuleFinder to iterate through the models to find the #chains module ChainFinder ! def find_chain() array = [] ! self.each_chain{ |chain| array.push(chain) if yield(chain) ! } return array end ! def each_chain() ! self.each_model{ |model| ! model.each{ |chain| yield chain } ! } end ! end module ResidueFinder ! def find_residue() array = [] ! self.each_residue{ |residue| array.push(residue) if yield(residue) ! } return array end ! def each_residue() ! self.each_chain{ |chain| ! chain.each{ |residue| yield residue } ! } end ! end module AtomFinder ! def find_atom() array = [] ! self.each_atom{ |atom| array.push(atom) if yield(atom) ! } return array end ! def each_atom() ! self.each_residue{ |residue| ! residue.each{ |atom| yield atom } ! } end - end module HetatmFinder ! def find_hetatm() array = [] self.each_hetatm do |hetatm| --- 192,315 ---- end end #module Utils ! ! #-- #The *Finder modules implement a find_* method which returns #an array of anything for which the block evals true #(suppose Enumerable#find_all method). #The each_* style methods act as classic iterators. + #++ + + # methods to access models + # + # XXX#each_model must be defined. module ModelFinder ! # returns an array containing all chains for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_model array = [] ! self.each_model do |model| array.push(model) if yield(model) ! end return array end ! end #module ModelFinder + #-- #The heirarchical nature of the objects allow us to re-use the #methods from the previous level - e.g. A PDB object can use the .models #method defined in ModuleFinder to iterate through the models to find the #chains + #++ + + # methods to access chains + # + # XXX#each_model must be defined. module ChainFinder ! ! # returns an array containing all chains for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_chain array = [] ! self.each_chain do |chain| array.push(chain) if yield(chain) ! end return array end ! ! # iterates over each chain ! def each_chain(&x) #:yields: chain ! self.each_model { |model| model.each(&x) } end ! ! # returns all chains ! def chains ! array = [] ! self.each_model { |model| array.concat(model.chains) } ! return array ! end ! end #module ChainFinder + # methods to access residues + # + # XXX#each_chain must be defined. module ResidueFinder ! ! # returns an array containing all residues for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_residue array = [] ! self.each_residue do |residue| array.push(residue) if yield(residue) ! end return array end ! ! # iterates over each residue ! def each_residue(&x) #:yields: residue ! self.each_chain { |chain| chain.each(&x) } end ! ! # returns all residues ! def residues ! array = [] ! self.each_chain { |chain| array.concat(chain.residues) } ! return array ! end ! end #module ResidueFinder + # methods to access atoms + # + # XXX#each_residue must be defined. module AtomFinder ! # returns an array containing all atoms for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_atom array = [] ! self.each_atom do |atom| array.push(atom) if yield(atom) ! end return array end ! ! # iterates over each atom ! def each_atom(&x) #:yields: atom ! self.each_residue { |residue| residue.each(&x) } end + # returns all atoms + def atoms + array = [] + self.each_residue { |residue| array.concat(residue.atoms) } + return array + end + end #module AtomFinder + + # methods to access HETATMs + # + # XXX#each_heterogen must be defined. module HetatmFinder ! # returns an array containing all HETATMs for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_hetatm array = [] self.each_hetatm do |hetatm| *************** *** 239,249 **** return array end def each_hetatm(&x) #:yields: hetatm self.each_heterogen { |heterogen| heterogen.each(&x) } end - end module HeterogenFinder ! def find_heterogen() array = [] self.each_heterogen do |heterogen| --- 318,342 ---- return array end + + # iterates over each HETATM def each_hetatm(&x) #:yields: hetatm self.each_heterogen { |heterogen| heterogen.each(&x) } end + # returns all HETATMs + def hetatms + array = [] + self.each_heterogen { |heterogen| array.concat(heterogen.hetatms) } + return array + end + end #module HetatmFinder + + # methods to access heterogens (compounds or ligands) + # + # XXX#each_chain must be defined. module HeterogenFinder ! # returns an array containing all heterogens for which given block ! # is not +false+ (similar to Enumerable#find_all). ! def find_heterogen array = [] self.each_heterogen do |heterogen| *************** *** 252,259 **** return array end def each_heterogen(&x) #:yields: heterogen self.each_chain { |chain| chain.each_heterogen(&x) } end ! end end; end #module Bio; class PDB --- 345,361 ---- return array end + + # iterates over each heterogens def each_heterogen(&x) #:yields: heterogen self.each_chain { |chain| chain.each_heterogen(&x) } end ! ! # returns all heterogens ! def heterogens ! array = [] ! self.each_chain { |chain| array.concat(chain.heterogens) } ! return array ! end ! end #module HeterogenFinder end; end #module Bio; class PDB Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** pdb.rb 5 Jan 2006 09:24:54 -0000 1.10 --- pdb.rb 5 Jan 2006 11:10:10 -0000 1.11 *************** *** 31,36 **** module Bio ! #This is the main PDB class which takes care of parsing, annotations ! #and is the entry way to the co-ordinate data held in models class PDB #< DB --- 31,37 ---- module Bio ! # This is the main PDB class which takes care of parsing, annotations ! # and is the entry way to the co-ordinate data held in models. ! # class PDB #< DB *************** *** 48,52 **** DELIMITER = RS = nil # 1 file 1 entry ! #Modules required by the field definitions module DataType --- 49,53 ---- DELIMITER = RS = nil # 1 file 1 entry ! # Modules required by the field definitions module DataType *************** *** 1369,1375 **** attr_reader :hash attr_reader :models ! #Adds a Bio::Model to the current strucutre def addModel(model) raise "Expecting a Bio::PDB::Model" if not model.is_a? Bio::PDB::Model --- 1370,1377 ---- attr_reader :hash + # models in this PDB entry attr_reader :models ! # Adds a Bio::Model object to the current strucutre def addModel(model) raise "Expecting a Bio::PDB::Model" if not model.is_a? Bio::PDB::Model *************** *** 1378,1386 **** end ! #Iterates over the models def each @models.each{ |model| yield model } end ! #Alias needed for Bio::PDB::ModelFinder alias each_model each --- 1380,1388 ---- end ! # Iterates over each model def each @models.each{ |model| yield model } end ! # Alias needed for Bio::PDB::ModelFinder alias each_model each *************** *** 1390,1394 **** @models.find{ |model| key == model.model_serial } end ! #Stringifies to a list of atom records - we could add the annotation #as well if needed --- 1392,1396 ---- @models.find{ |model| key == model.model_serial } end ! #Stringifies to a list of atom records - we could add the annotation #as well if needed From ngoto at pub.open-bio.org Sun Jan 8 07:59:06 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun Jan 8 07:49:51 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb atom.rb, 1.5, 1.6 chain.rb, 1.3, 1.4 model.rb, 1.4, 1.5 pdb.rb, 1.11, 1.12 residue.rb, 1.7, 1.8 utils.rb, 1.4, 1.5 Message-ID: <200601081259.k08Cx6VL006590@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv6561 Modified Files: atom.rb chain.rb model.rb pdb.rb residue.rb utils.rb Log Message: * added RDoc. * Bio::PDB#seqres returns Bio::Sequence::NA object if the chain seems to be a nucleic acid sequence. * Bio::PDB#inspect calls do_parse before inspect. * Bio::PDB#atom_seq is now an alias of Bio::PDB::Chain#aaseq. * Bio::PDB#seqres and Bio::PDB::Chain#aaseq are changed to use Bio::AminoAcid.three2one(tlc). Index: atom.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/atom.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** atom.rb 18 Dec 2005 17:33:32 -0000 1.5 --- atom.rb 8 Jan 2006 12:59:04 -0000 1.6 *************** *** 1,8 **** # ! # bio/db/pdb/atom.rb - Coordinate and atom class for PDB # ! # Copyright (C) 2004 Alex Gutteridge ! # Copyright (C) 2004 GOTO Naohisa # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # ! # = bio/db/pdb/atom.rb - Coordinate class for PDB # ! # Copyright:: Copyright (C) 2004, 2006 ! # Alex Gutteridge ! # Naohisa Goto ! # License:: LGPL ! # ! # $Id$ # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 18,23 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # - # $Id$ require 'matrix' --- 23,37 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ + # + # = Bio::PDB::Coordinate + # + # Coordinate class for PDB. + # + # = Compatibility Note + # + # From bioruby 0.7.0, the Bio::PDB::Atom class is no longer available. + # Please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM instead. # require 'matrix' *************** *** 27,55 **** class PDB class Coordinate < Vector def self.[](x,y,z) super end def self.elements(array, *a) raise 'Size of given array must be 3' if array.size != 3 super end ! def x; self[0]; end def y; self[1]; end def z; self[2]; end def x=(n); self[0]=n; end def y=(n); self[1]=n; end def z=(n); self[2]=n; end # Definition of 'to_ary' means objects of the class is # implicitly regarded as an array. def to_ary; self.to_a; end def xyz; self; end ! def distance(object2) ! Utils::to_xyz(object2) (self - object2).r end --- 41,88 ---- class PDB + # Bio::PDB::Coordinate is a class to store a 3D coordinate. + # It inherits Vector (in bundled library in Ruby). + # class Coordinate < Vector + # same as Vector.[x,y,z] def self.[](x,y,z) super end + # same as Vector.elements def self.elements(array, *a) raise 'Size of given array must be 3' if array.size != 3 super end ! ! # x def x; self[0]; end + # y def y; self[1]; end + # z def z; self[2]; end + # x=(n) def x=(n); self[0]=n; end + # y=(n) def y=(n); self[1]=n; end + # z=(n) def z=(n); self[2]=n; end + # Implicit conversion to an array. + # + # Note that this method would be deprecated in the future. + # + #-- # Definition of 'to_ary' means objects of the class is # implicitly regarded as an array. + #++ def to_ary; self.to_a; end + # returns self. def xyz; self; end ! ! # distance between object2. def distance(object2) ! Utils::convert_to_xyz(object2) (self - object2).r end Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** residue.rb 4 Jan 2006 15:41:50 -0000 1.7 --- residue.rb 8 Jan 2006 12:59:04 -0000 1.8 *************** *** 1,7 **** # ! # bio/db/pdb/residue.rb - residue class for PDB # ! # Copyright (C) 2004 Alex Gutteridge # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # ! # = bio/db/pdb/residue.rb - residue class for PDB # ! # Copyright:: Copyright (C) 2004, 2006 ! # Alex Gutteridge ! # Naohisa Goto ! # License:: LGPL ! # ! # $Id$ # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 17,22 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # - # $Id$ require 'bio/db/pdb' --- 23,32 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ + # + # = Bio::PDB::Residue + # + # = Bio::PDB::Heterogen # require 'bio/db/pdb' *************** *** 26,30 **** class PDB ! #Residue class - id is a composite of resSeq and iCode class Residue --- 36,42 ---- class PDB ! # Bio::PDB::Residue is a class to store a residue. ! # The object would contain some atoms (Bio::PDB::Record::ATOM objects). ! # class Residue *************** *** 63,67 **** attr_accessor :resName ! # residue id (String or nil) attr_reader :residue_id --- 75,80 ---- attr_accessor :resName ! # residue id (String or nil). ! # The id is a composite of resSeq and iCode. attr_reader :residue_id *************** *** 135,138 **** --- 148,153 ---- end + # Always returns false. + # # If the residue is HETATM, returns true. # Otherwise, returns false. *************** *** 142,149 **** --- 157,171 ---- end #class Residue + # Bio::PDB::Heterogen is a class to store a heterogen. + # It inherits Bio::PDB::Residue and most of the methods are the same. + # + # The object would contain some HETATMs + # (Bio::PDB::Record::HETATM objects). class Heterogen < Residue include HetatmFinder + # Always returns true. + # # If the residue is HETATM, returns true. # Otherwise, returns false. *************** *** 155,158 **** --- 177,181 ---- alias each_hetatm each + # Alias needed for HeterogenFinder. alias hetatms atoms end #class Heterogen Index: model.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/model.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** model.rb 5 Jan 2006 09:24:54 -0000 1.4 --- model.rb 8 Jan 2006 12:59:04 -0000 1.5 *************** *** 1,7 **** # ! # bio/db/pdb/model.rb - model class for PDB # ! # Copyright (C) 2004 Alex Gutteridge # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # ! # = bio/db/pdb/model.rb - model class for PDB # ! # Copyright:: Copyright (C) 2004, 2006 ! # Alex Gutteridge ! # Naohisa Goto ! # License:: LGPL ! # ! # $Id$ # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 17,22 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # - # $Id$ require 'bio/db/pdb' --- 23,32 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ + # + # = Bio::PDB::Model + # + # Please refer Bio::PDB::Model. # require 'bio/db/pdb' *************** *** 26,30 **** class PDB ! # Model class class Model --- 36,42 ---- class PDB ! # Bio::PDB::Model is a class to store a model. ! # ! # The object would contain some chains (Bio::PDB::Chain objects). class Model *************** *** 52,56 **** attr_reader :chains ! # (OBSOLETE) solvents in this model attr_reader :solvents --- 64,68 ---- attr_reader :chains ! # (OBSOLETE) solvents (water, HOH) in this model attr_reader :solvents *************** *** 61,65 **** alias model_serial serial ! # (deprecated) attr_reader :structure --- 73,77 ---- alias model_serial serial ! # (reserved for future extension) attr_reader :structure Index: utils.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/utils.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** utils.rb 5 Jan 2006 11:10:10 -0000 1.4 --- utils.rb 8 Jan 2006 12:59:04 -0000 1.5 *************** *** 1,8 **** # ! # bio/db/pdb/utils.rb - Utility modules for PDB # ! # Copyright (C) 2004 Alex Gutteridge ! # Copyright (C) 2004 GOTO Naohisa # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # ! # = bio/db/pdb/utils.rb - Utility modules for PDB # ! # Copyright:: Copyright (C) 2004, 2006 ! # Alex Gutteridge ! # Naohisa Goto ! # License:: LGPL # + # $Id$ + # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 18,23 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # - # $Id$ require 'matrix' --- 23,56 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ + # + # = Bio::PDB::Utils + # + # Bio::PDB::Utils + # + # = Bio::PDB::ModelFinder + # + # Bio::PDB::ModelFinder + # + # = Bio::PDB::ChainFinder + # + # Bio::PDB::ChainFinder + # + # = Bio::PDB::ResidueFinder + # + # Bio::PDB::ResidueFinder + # + # = Bio::PDB::AtomFinder + # + # Bio::PDB::AtomFinder + # + # = Bio::PDB::HeterogenFinder + # + # Bio::PDB::HeterogenFinder + # + # = Bio::PDB::HetatmFinder + # + # Bio::PDB::HetatmFinder # require 'matrix' *************** *** 27,32 **** # Utility methods for PDB data. - # # The methods in this mixin should be applicalbe to all PDB objects. module Utils --- 60,67 ---- # Utility methods for PDB data. # The methods in this mixin should be applicalbe to all PDB objects. + # + # Bio::PDB::Utils is included by Bio::PDB, Bio::PDB::Model, + # Bio::PDB::Chain, Bio::PDB::Residue, and Bio::PDB::Heterogen classes. module Utils *************** *** 203,206 **** --- 238,244 ---- # # XXX#each_model must be defined. + # + # Bio::PDB::ModelFinder is included by Bio::PDB::PDB. + # module ModelFinder # returns an array containing all chains for which given block *************** *** 225,228 **** --- 263,269 ---- # # XXX#each_model must be defined. + # + # Bio::PDB::ChainFinder is included by Bio::PDB::PDB and Bio::PDB::Model. + # module ChainFinder *************** *** 253,256 **** --- 294,301 ---- # # XXX#each_chain must be defined. + # + # Bio::PDB::ResidueFinder is included by Bio::PDB::PDB, Bio::PDB::Model, + # and Bio::PDB::Chain. + # module ResidueFinder *************** *** 308,311 **** --- 353,360 ---- # # XXX#each_heterogen must be defined. + # + # Bio::PDB::HetatmFinder is included by Bio::PDB::PDB, Bio::PDB::Model, + # Bio::PDB::Chain, and Bio::PDB::Heterogen. + # module HetatmFinder # returns an array containing all HETATMs for which given block *************** *** 335,338 **** --- 384,391 ---- # # XXX#each_chain must be defined. + # + # Bio::PDB::HeterogenFinder is included by Bio::PDB::PDB, Bio::PDB::Model, + # and Bio::PDB::Chain. + # module HeterogenFinder # returns an array containing all heterogens for which given block Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** pdb.rb 5 Jan 2006 11:10:10 -0000 1.11 --- pdb.rb 8 Jan 2006 12:59:04 -0000 1.12 *************** *** 1,8 **** # ! # bio/db/pdb/pdb.rb - PDB database class for PDB file format # ! # Copyright (C) 2003,2004 GOTO Naohisa ! # Copyright (C) 2004 Alex Gutteridge # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # [...1339 lines suppressed...] - - --- Bio::PDB::Record#"anything" - - Same as Bio::PDB::Record#[](:anything) after do_parse. - For example, r.helixID is same as r.do_parse; r[:helixID] . - - - = Bio::PDB::FieldDef - - Internal use only. - Format definition of each record. - - = References - - * (()) - * PDB File Format Contents Guide Version 2.2 (20 December 1996) - (()) - - =end --- 1819,1820 ---- Index: chain.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/chain.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** chain.rb 4 Jan 2006 15:41:50 -0000 1.3 --- chain.rb 8 Jan 2006 12:59:04 -0000 1.4 *************** *** 1,7 **** # ! # bio/db/pdb/chain.rb - chain class for PDB # ! # Copyright (C) 2004 Alex Gutteridge # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,13 ---- # ! # = bio/db/pdb/chain.rb - chain class for PDB # ! # Copyright:: Copyright (C) 2004, 2006 ! # Alex Gutteridge ! # Naohisa Goto ! # License:: LGPL ! # ! # $Id$ # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 17,22 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # - # $Id$ require 'bio/db/pdb' --- 23,32 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ + # + # = Bio::PDB::Chain + # + # Please refer Bio::PDB::Chain. # require 'bio/db/pdb' *************** *** 26,29 **** --- 36,44 ---- class PDB + # Bio::PDB::Chain is a class to store a chain. + # + # The object would contain some residues (Bio::PDB::Residue objects) + # and some heterogens (Bio::PDB::Heterogen objects). + # class Chain *************** *** 37,41 **** include Enumerable include Comparable ! def initialize(id = nil, model = nil) --- 52,57 ---- include Enumerable include Comparable ! ! # Creates a new chain object. def initialize(id = nil, model = nil) *************** *** 48,56 **** end attr_accessor :chain_id ! attr_reader :model ! alias id chain_id # residues in this chain attr_reader :residues --- 64,75 ---- end + # Identifier of this chain attr_accessor :chain_id ! # alias alias id chain_id + # the model to which this chain belongs. + attr_reader :model + # residues in this chain attr_reader :residues *************** *** 113,134 **** end ! # gets an amino acid sequence of the chain ! def atom_seq ! string = "" ! last_residue_num = nil ! @residues.each do |residue| ! if last_residue_num and ! (x = (residue.resSeq.to_i - last_residue_num).abs) > 1 then ! x.times { string << 'X' } ! end ! tlc = residue.resName.capitalize ! olc = AminoAcid.names.invert[tlc] ! if !olc ! olc = 'X' end ! string << olc end ! Bio::Sequence::AA.new(string) end end #class Chain --- 132,155 ---- end ! # gets an amino acid sequence of this chain from ATOM records ! def aaseq ! unless defined? @aaseq ! string = "" ! last_residue_num = nil ! @residues.each do |residue| ! if last_residue_num and ! (x = (residue.resSeq.to_i - last_residue_num).abs) > 1 then ! x.times { string << 'X' } ! end ! tlc = residue.resName.capitalize ! olc = (Bio::AminoAcid.three2one(tlc) or 'X') ! string << olc end ! @aaseq = Bio::Sequence::AA.new(string) end ! @aaseq end + # for backward compatibility + alias atom_seq aaseq end #class Chain From ngoto at pub.open-bio.org Mon Jan 9 06:22:38 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Mon Jan 9 06:13:42 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb model.rb, 1.5, 1.6 chain.rb, 1.4, 1.5 residue.rb, 1.8, 1.9 Message-ID: <200601091122.k09BMcVL009266@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv9256 Modified Files: model.rb chain.rb residue.rb Log Message: * model.rb (Bio::PDB::Model) * @chains_hash is introduced to speed up parsing. * added rehash method. * chain.rb (Bio::PDB::Chain) * @residues_hash and @heterogens_hash are introduced to speed up parsing. * added rehash_residues, rehash_heterogens, and rehash methods. * residue.rb * added an alias heterogen_id. Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** residue.rb 8 Jan 2006 12:59:04 -0000 1.8 --- residue.rb 9 Jan 2006 11:22:36 -0000 1.9 *************** *** 179,182 **** --- 179,185 ---- # Alias needed for HeterogenFinder. alias hetatms atoms + + # Alias to avoid confusion + alias heterogen_id residue_id end #class Heterogen Index: model.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/model.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** model.rb 8 Jan 2006 12:59:04 -0000 1.5 --- model.rb 9 Jan 2006 11:22:36 -0000 1.6 *************** *** 58,61 **** --- 58,62 ---- @structure = structure @chains = [] + @chains_hash = {} @solvents = Chain.new('', self) end *************** *** 80,84 **** raise "Expecting a Bio::PDB::Chain" unless chain.is_a? Bio::PDB::Chain @chains.push(chain) ! self end --- 81,108 ---- raise "Expecting a Bio::PDB::Chain" unless chain.is_a? Bio::PDB::Chain @chains.push(chain) ! if @chains_hash[chain.chain_id] then ! $stderr.puts "Warning: chain_id #{chain.chain_id.inspect} is already used" if $VERBOSE ! else ! @chains_hash[chain.chain_id] = chain ! end ! self ! end ! ! # rehash chains hash ! def rehash ! begin ! chains_bak = @chains ! chains_hash_bak = @chains_hash ! @chains = [] ! @chains_hash = {} ! chains_bak.each do |chain| ! self.addChain(chain) ! end ! rescue RuntimeError ! @chains = chains_bak ! @chains_hash = chains_hash_bak ! raise ! end ! self end *************** *** 108,112 **** # Keyed access to chains def [](key) ! chain = @chains.find{ |chain| key == chain.id } end --- 132,137 ---- # Keyed access to chains def [](key) ! #chain = @chains.find{ |chain| key == chain.id } ! @chains_hash[key] end Index: chain.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/chain.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** chain.rb 8 Jan 2006 12:59:04 -0000 1.4 --- chain.rb 9 Jan 2006 11:22:36 -0000 1.5 *************** *** 61,65 **** --- 61,67 ---- @residues = [] + @residues_hash = {} @heterogens = [] + @heterogens_hash = {} end *************** *** 80,90 **** # get the residue by id def get_residue_by_id(key) ! @residues.find { |r| r.residue_id == key } end # get the residue by id. ! # Compatibility Note: now, you cannot find HETATMS in this method. ! # To add LIGAND to the id is no longer available. ! # To get heterogens, you must use get_heterogen_by_id. def [](key) get_residue_by_id(key) --- 82,94 ---- # get the residue by id def get_residue_by_id(key) ! #@residues.find { |r| r.residue_id == key } ! @residues_hash[key] end # get the residue by id. ! # ! # Compatibility Note: Now, you cannot find HETATMS in this method. ! # To add "LIGAND" to the id is no longer available. ! # To get heterogens, you must use get_heterogen_by_id. def [](key) get_residue_by_id(key) *************** *** 93,97 **** # get the heterogen (ligand) by id def get_heterogen_by_id(key) ! @heterogens.find { |r| r.residue_id == key } end --- 97,102 ---- # get the heterogen (ligand) by id def get_heterogen_by_id(key) ! #@heterogens.find { |r| r.residue_id == key } ! @heterogens_hash[key] end *************** *** 100,103 **** --- 105,113 ---- raise "Expecting a Bio::PDB::Residue" unless residue.is_a? Bio::PDB::Residue @residues.push(residue) + if @residues_hash[residue.residue_id] then + $stderr.puts "Warning: residue_id #{residue.residue_id.inspect} is already used" if $VERBOSE + else + @residues_hash[residue.residue_id] = residue + end self end *************** *** 107,113 **** raise "Expecting a Bio::PDB::Residue" unless ligand.is_a? Bio::PDB::Residue @heterogens.push(ligand) self end ! # Iterates over each residue def each(&x) #:yields: residue --- 117,170 ---- raise "Expecting a Bio::PDB::Residue" unless ligand.is_a? Bio::PDB::Residue @heterogens.push(ligand) + if @heterogens_hash[ligand.residue_id] then + $stderr.puts "Warning: heterogen_id (residue_id) #{ligand.residue_id.inspect} is already used" if $VERBOSE + else + @heterogens_hash[ligand.residue_id] = ligand + end self end ! ! # rehash residues hash ! def rehash_residues ! begin ! residues_bak = @residues ! residues_hash_bak = @residues_hash ! @residues = [] ! @residues_hash = {} ! residues_bak.each do |residue| ! self.addResidue(residue) ! end ! rescue RuntimeError ! @residues = residues_bak ! @residues_hash = residues_hash_bak ! raise ! end ! self ! end ! ! # rehash heterogens hash ! def rehash_heterogens ! begin ! heterogens_bak = @heterogens ! heterogens_hash_bak = @heterogens_hash ! @heterogens = [] ! @heterogens_hash = {} ! heterogens_bak.each do |heterogen| ! self.addLigand(heterogen) ! end ! rescue RuntimeError ! @heterogens = heterogens_bak ! @heterogens_hash = heterogens_hash_bak ! raise ! end ! self ! end ! ! # rehash residues hash and heterogens hash ! def rehash ! rehash_residues ! rehash_heterogens ! end ! # Iterates over each residue def each(&x) #:yields: residue From k at pub.open-bio.org Thu Jan 12 03:58:29 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Thu Jan 12 03:48:45 2006 Subject: [BioRuby-cvs] bioruby/lib/bio db.rb,0.31,0.32 Message-ID: <200601120858.k0C8wTVL021626@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv21612/lib/bio Modified Files: db.rb Log Message: * fixed bug of tag_cut method (included in bioruby 0.7.0 release...) Index: db.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db.rb,v retrieving revision 0.31 retrieving revision 0.32 diff -C2 -d -r0.31 -r0.32 *** db.rb 7 Dec 2005 11:23:51 -0000 0.31 --- db.rb 12 Jan 2006 08:58:27 -0000 0.32 *************** *** 211,237 **** # space and stripeed. def truncate(str) ! if str ! str.gsub(/\s+/, ' ').strip ! else ! "" ! end end # Returns a tag name of the field as a String. def tag_get(str) ! if str ! str[0,@tagsize].strip ! else ! "" ! end end # Returns a String of the field without a tag name. def tag_cut(str) ! if str ! str[0,@tagsize] = '' ! else ! "" ! end end --- 211,229 ---- # space and stripeed. def truncate(str) ! str ||= "" ! return str.gsub(/\s+/, ' ').strip end # Returns a tag name of the field as a String. def tag_get(str) ! str ||= "" ! return str[0,@tagsize].strip end # Returns a String of the field without a tag name. def tag_cut(str) ! str ||= "" ! str[0,@tagsize] = '' ! return str end From ngoto at pub.open-bio.org Sun Jan 15 04:41:44 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun Jan 15 04:31:59 2006 Subject: [BioRuby-cvs] bioruby/doc Changes-0.7.rd,1.12,1.13 Message-ID: <200601150941.k0F9fiVL008790@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv8780/doc Modified Files: Changes-0.7.rd Log Message: Added/modified changes for PDB classes Index: Changes-0.7.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Changes-0.7.rd,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** Changes-0.7.rd 18 Dec 2005 19:13:09 -0000 1.12 --- Changes-0.7.rd 15 Jan 2006 09:41:42 -0000 1.13 *************** *** 189,202 **** --- Bio::PDB * Bio::PDB::Atom is removed. Instead, please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM. * Bio::PDB::FieldDef is removed and Bio::PDB::Record is completely ! changed. Now, Record is changed from hash to Struct, and ! method_missing is no longer used. * In the "MODEL" record, model_serial is changed to serial. * In records, record_type is changed to record_name. ! * In any records, record_type is changed to record_name. ! * In most records contains real numbers, changed to return ! float values instead of strings. * Pdb_AChar, Pdb_Atom, Pdb_Character, Pdb_Continuation, Pdb_Date, Pdb_IDcode, Pdb_Integer, Pdb_LString, Pdb_List, --- 189,207 ---- --- Bio::PDB + In 0.7.0: + * Bio::PDB::Atom is removed. Instead, please use Bio::PDB::Record::ATOM and Bio::PDB::Record::HETATM. * Bio::PDB::FieldDef is removed and Bio::PDB::Record is completely ! changed. Now, records is changed from hash to Struct objects. ! (Note that method_missing is no longer used.) ! * In records, "do_parse" is now automatically called. ! Users don't need to call do_parse explicitly. ! (0.7.0 feature: "inspect" does not call do_parse.) ! (0.7.1 feature: "inspect" calls do_parse.) * In the "MODEL" record, model_serial is changed to serial. * In records, record_type is changed to record_name. ! * In most records contains real numbers, return values are changed ! to float instead of string. * Pdb_AChar, Pdb_Atom, Pdb_Character, Pdb_Continuation, Pdb_Date, Pdb_IDcode, Pdb_Integer, Pdb_LString, Pdb_List, *************** *** 204,207 **** --- 209,228 ---- Pdb_String, Pdb_StringRJ and Pdb_SymOP are moved under Bio::PDB::DataType. + * There are more and more changes to be written... + + In 0.7.1: + + * Heterogens and HETATMs are completely separeted from residues and ATOMs. + HETATMs (Bio::PDB::Record::HETATM objects) are stored in + Bio::PDB::Heterogen (which inherits Bio::PDB::Residue). + * Waters (resName=="HOH") are treated as normal heterogens. + Model#solvents is still available but it will be deprecated. + * In Bio::PDB::Chain, adding "LIGAND" to the heterogen id is no longer + available. Instead, please use Chain#get_heterogen_by_id method. + In addition, Bio::{PDB|PDB::Model::PDB::Chain}#heterogens, #each_heterogen, + #find_heterogen, Bio::{PDB|PDB::Model::PDB::Chain::PDB::Heterogen}#hetatms, + #each_hetatm, #find_hetatm methods are added. + * Bio::PDB#seqres returns Bio::Sequence::NA object if the chain seems to be + a nucleic acid sequence. * There are more and more changes to be written... From k at pub.open-bio.org Mon Jan 16 10:23:14 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Mon Jan 16 10:13:49 2006 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd.ja,1.18,1.19 Message-ID: <200601161523.k0GFNEVL013283@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv13279/doc Modified Files: Tutorial.rd.ja Log Message: * added notes on rubygems Index: Tutorial.rd.ja =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd.ja,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** Tutorial.rd.ja 7 Dec 2005 11:40:45 -0000 1.18 --- Tutorial.rd.ja 16 Jan 2006 15:23:11 -0000 1.19 *************** *** 51,54 **** --- 51,66 ---- * (()) + === RubyGems ¤Î¥¤¥ó¥¹¥È¡¼¥ë + + RubyGems ¤Î¥Ú¡¼¥¸¤«¤éºÇ¿·ÈǤò¥À¥¦¥ó¥í¡¼¥É¤·¤Þ¤¹¡£ + + * (()) + + Ÿ³«¤·¤Æ¥¤¥ó¥¹¥È¡¼¥ë¤·¤Þ¤¹¡£ + + % tar zxvf rubygems-x.x.x.tar.gz + % cd rubygems-x.x.x + % ruby setup.rb + === BioRuby ¤Î¥¤¥ó¥¹¥È¡¼¥ë *************** *** 65,71 **** # ruby install.rb install ! ¤µ¤é¤Ë¡¢RubyGems ¤¬»È¤¨¤ë´Ä¶­¤Ç¤¢¤ì¤Ð ! % gems install bio ¤À¤±¤Ç¥¤¥ó¥¹¥È¡¼¥ë¤Ç¤­¤Þ¤¹¡£ --- 77,83 ---- # ruby install.rb install ! RubyGems ¤¬»È¤¨¤ë´Ä¶­¤Ç¤¢¤ì¤Ð ! % gem install bio ¤À¤±¤Ç¥¤¥ó¥¹¥È¡¼¥ë¤Ç¤­¤Þ¤¹¡£ *************** *** 2049,2053 **** entry = serv.get_by_id('AA2CG') ! ¤â¤·(4) ¤ò»È¤¤¤¿¤¤¾ì¹ç¤Ï seqdatabase.ini ¤Ç [genbank] --- 2061,2065 ---- entry = serv.get_by_id('AA2CG') ! ¤â¤· (4) ¤ò»È¤¤¤¿¤¤¾ì¹ç¤Ï seqdatabase.ini ¤Ç [genbank] From k at pub.open-bio.org Mon Jan 16 10:25:10 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Mon Jan 16 10:14:56 2006 Subject: [BioRuby-cvs] bioruby ChangeLog,1.45,1.46 Message-ID: <200601161525.k0GFPAVL013312@pub.open-bio.org> Update of /home/repository/bioruby/bioruby In directory pub.open-bio.org:/tmp/cvs-serv13306 Modified Files: ChangeLog Log Message: * 0.7.0 release is recorded Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** ChangeLog 18 Dec 2005 18:44:25 -0000 1.45 --- ChangeLog 16 Jan 2006 15:25:07 -0000 1.46 *************** *** 1,2 **** --- 1,6 ---- + 2005-12-19 Toshiaki Katayama + + * BioRuby 0.7.0 is released. + 2005-12-19 Naohisa Goto From k at pub.open-bio.org Mon Jan 16 10:25:45 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Mon Jan 16 10:15:58 2006 Subject: [BioRuby-cvs] bioruby/doc Changes-0.7.rd,1.13,1.14 Message-ID: <200601161525.k0GFPjVL013360@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv13356/doc Modified Files: Changes-0.7.rd Log Message: * minor improvements (hopefully) Index: Changes-0.7.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Changes-0.7.rd,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** Changes-0.7.rd 15 Jan 2006 09:41:42 -0000 1.13 --- Changes-0.7.rd 16 Jan 2006 15:25:43 -0000 1.14 *************** *** 6,11 **** --- Ruby 1.6 series are no longer supported. ! We use autoload functionality and many other libraries bundled in ! Ruby 1.8.2 (such as SOAP, open-uri, pp etc.) by default. --- BioRuby will be loaded about 30 times faster than before. --- 6,11 ---- --- Ruby 1.6 series are no longer supported. ! We use autoload functionality and many standard (bundled) libraries ! (such as SOAP, open-uri, pp etc.) only in Ruby >1.8.2. --- BioRuby will be loaded about 30 times faster than before. *************** *** 14,19 **** to start up the BioRuby library made surprisingly faster. ! Other changes (including exciting BioRuby shell etc.) made in this release ! is described in this file. == New features --- 14,19 ---- to start up the BioRuby library made surprisingly faster. ! Other changes (including newly introduced BioRuby shell etc.) made ! in this series will be described in this file. == New features *************** *** 21,25 **** --- BioRuby shell ! Command line user interface for the BioRuby is included. You can invoke the shell by --- 21,25 ---- --- BioRuby shell ! A new command line user interface for the BioRuby is now included. You can invoke the shell by *************** *** 140,144 **** * lib/bio/db/genbank/common.rb is removed. ! Renamed to Bio::NCBIDB::Common for the simple autoload dependency. --- Bio::EMBL::Common --- 140,144 ---- * lib/bio/db/genbank/common.rb is removed. ! Renamed to Bio::NCBIDB::Common to make simplify the autoload dependency. --- Bio::EMBL::Common *************** *** 146,150 **** * lib/bio/db/embl/common.rb is removed. ! Renamed to Bio::EMBLDB::Common for the simple autoload dependency. --- Bio::KEGG::GENES --- 146,150 ---- * lib/bio/db/embl/common.rb is removed. ! Renamed to Bio::EMBLDB::Common to make simplify the autoload dependency. --- Bio::KEGG::GENES *************** *** 245,249 **** require 'bio' ! and this change will also speeds up loading time if you only need one of the sub classes under the genbank/ or embl/ directory. --- 245,249 ---- require 'bio' ! and this change will also speeds up loading time even if you only need one of the sub classes under the genbank/ or embl/ directory. From k at pub.open-bio.org Fri Jan 20 04:53:26 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 04:42:45 2006 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db test_rebase.rb,1.1,1.2 Message-ID: <200601200953.k0K9rQVL025550@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db In directory pub.open-bio.org:/tmp/cvs-serv25546/test/unit/bio/db Modified Files: test_rebase.rb Log Message: * class name is fixed to run w/o errors Index: test_rebase.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/db/test_rebase.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** test_rebase.rb 5 Dec 2005 19:44:07 -0000 1.1 --- test_rebase.rb 20 Jan 2006 09:53:24 -0000 1.2 *************** *** 36,40 **** module Bio ! class TestGFF < Test::Unit::TestCase def setup --- 36,40 ---- module Bio ! class TestREBASE < Test::Unit::TestCase def setup From k at pub.open-bio.org Fri Jan 20 04:57:10 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 04:46:27 2006 Subject: [BioRuby-cvs] bioruby ChangeLog,1.46,1.47 gemspec.rb,1.4,1.5 Message-ID: <200601200957.k0K9vAVL025625@pub.open-bio.org> Update of /home/repository/bioruby/bioruby In directory pub.open-bio.org:/tmp/cvs-serv25619 Modified Files: ChangeLog gemspec.rb Log Message: * updated for 0.7.1 Index: gemspec.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/gemspec.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** gemspec.rb 8 Sep 2005 01:16:52 -0000 1.4 --- gemspec.rb 20 Jan 2006 09:57:08 -0000 1.5 *************** *** 4,8 **** spec = Gem::Specification.new do |s| s.name = 'bio' ! s.version = "0.7.0" s.author = "BioRuby project" s.email = "staff@bioruby.org" --- 4,8 ---- spec = Gem::Specification.new do |s| s.name = 'bio' ! s.version = "0.7.1" s.author = "BioRuby project" s.email = "staff@bioruby.org" Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.46 retrieving revision 1.47 diff -C2 -d -r1.46 -r1.47 *** ChangeLog 16 Jan 2006 15:25:07 -0000 1.46 --- ChangeLog 20 Jan 2006 09:57:08 -0000 1.47 *************** *** 1,2 **** --- 1,21 ---- + 2005-01-20 Toshiaki Katayama + + * BioRuby 0.7.1 is released. + + * test/unit/bio/db/test_rebase.rb: fixed to run w/o errors. + + 2005-01-12 Toshiaki Katayama + + * lib/bio/db.ra: fixed a bug of the tag_cut method introduced in 0.7.0 + (reported by Alex Gutteridge) + + 2005-01-04 Naohisa Goto + + * Bio::PDB is refactored. See doc/Changes-0.7 for more details. + + 2005-12-28 Toshiaki Katayama + + * test/unit/bio/util/test_sirna.rb: fixed to run w/o errors. + 2005-12-19 Toshiaki Katayama From k at pub.open-bio.org Fri Jan 20 04:57:10 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 04:46:32 2006 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.58,1.59 Message-ID: <200601200957.k0K9vAVL025630@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory pub.open-bio.org:/tmp/cvs-serv25619/lib Modified Files: bio.rb Log Message: * updated for 0.7.1 Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.58 retrieving revision 1.59 diff -C2 -d -r1.58 -r1.59 *** bio.rb 28 Nov 2005 04:57:32 -0000 1.58 --- bio.rb 20 Jan 2006 09:57:08 -0000 1.59 *************** *** 2,6 **** # = bio.rb - Loading all BioRuby modules # ! # Copyright:: Copyright (C) 2001-2005 # Toshiaki Katayama # License:: LGPL --- 2,6 ---- # = bio.rb - Loading all BioRuby modules # ! # Copyright:: Copyright (C) 2001-2006 # Toshiaki Katayama # License:: LGPL *************** *** 29,33 **** module Bio ! BIORUBY_VERSION = [0, 7, 0].extend(Comparable) ### Basic data types --- 29,33 ---- module Bio ! BIORUBY_VERSION = [0, 7, 1].extend(Comparable) ### Basic data types *************** *** 79,95 **** ## GenBank/RefSeq/DDBJ - # module Bio - # autoload :NCBIDB, 'bio/db' - # class GenBank < NCBIDB - # autoload :Common, 'bio/db/genbank/common' - # include Bio::GenBank::Common - - # module Bio - # autoload :NCBIDB, 'bio/db' - # end - # class Bio::GenBank < Bio::NCBIDB - # autoload :Common, 'bio/db/genbank/common' - # include Bio::GenBank::Common - autoload :GenBank, 'bio/db/genbank/genbank' autoload :GenPept, 'bio/db/genbank/genpept' --- 79,82 ---- *************** *** 109,113 **** autoload :SwissProt, 'bio/db/embl/swissprot' - ## KEGG --- 96,99 ---- *************** *** 255,256 **** --- 241,243 ---- end + From k at pub.open-bio.org Fri Jan 20 04:58:34 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 04:47:47 2006 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.49,0.50 Message-ID: <200601200958.k0K9wYVL025710@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv25695/lib/bio Modified Files: sequence.rb Log Message: * test code updated Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.49 retrieving revision 0.50 diff -C2 -d -r0.49 -r0.50 *** sequence.rb 27 Nov 2005 15:46:01 -0000 0.49 --- sequence.rb 20 Jan 2006 09:58:31 -0000 0.50 *************** *** 517,522 **** puts "\n== Test Bio::Sequence::NA#gc_percent" ! p na.gc ! p rna.gc puts "\n== Test Bio::Sequence::NA#illegal_bases" --- 517,522 ---- puts "\n== Test Bio::Sequence::NA#gc_percent" ! p na.gc_percent ! p rna.gc_percent puts "\n== Test Bio::Sequence::NA#illegal_bases" From k at pub.open-bio.org Fri Jan 20 07:04:05 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 06:53:32 2006 Subject: [BioRuby-cvs] bioruby/test/functional/bio/io test_soapwsdl.rb, 1.1, 1.2 Message-ID: <200601201204.k0KC45VL025969@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/functional/bio/io In directory pub.open-bio.org:/tmp/cvs-serv25948/test/functional/bio/io Modified Files: test_soapwsdl.rb Log Message: * to avoid erros (such as NoMethodError) caused by collision of class names, the class name for functional test is changed to FuncTestHOGE (the class name for unit test is TestHOGE as before). Index: test_soapwsdl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/functional/bio/io/test_soapwsdl.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** test_soapwsdl.rb 18 Dec 2005 17:11:25 -0000 1.1 --- test_soapwsdl.rb 20 Jan 2006 12:04:03 -0000 1.2 *************** *** 31,35 **** module Bio ! class TestSOAPWSDL < Test::Unit::TestCase def setup --- 31,35 ---- module Bio ! class FuncTestSOAPWSDL < Test::Unit::TestCase def setup *************** *** 60,62 **** --- 60,64 ---- end + end + From nakao at pub.open-bio.org Fri Jan 20 07:37:43 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Fri Jan 20 07:26:58 2006 Subject: [BioRuby-cvs] bioruby install.rb,1.3,1.4 Message-ID: <200601201237.k0KCbhVL026084@pub.open-bio.org> Update of /home/repository/bioruby/bioruby In directory pub.open-bio.org:/tmp/cvs-serv26072 Modified Files: install.rb Log Message: * Added './lib' to the library search path ($:) in the exec_test method. Index: install.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/install.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** install.rb 12 Nov 2004 17:49:05 -0000 1.3 --- install.rb 20 Jan 2006 12:37:41 -0000 1.4 *************** *** 662,665 **** --- 662,667 ---- def exec_test + bioruby_path = './lib' + $:.unshift(bioruby_path) unless $:.include?(bioruby_path) testdir = 'test' $stderr.printf "Running all tests in %s...\n", testdir if verbose? From k at pub.open-bio.org Fri Jan 20 08:04:31 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 07:53:45 2006 Subject: [BioRuby-cvs] bioruby/test runner.rb,1.2,1.3 Message-ID: <200601201304.k0KD4VVL026229@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test In directory pub.open-bio.org:/tmp/cvs-serv26203/test Modified Files: runner.rb Log Message: * fixed to follow the change of AutoRunnner.run in Ruby 1.8.3 Index: runner.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/runner.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** runner.rb 23 Oct 2005 10:40:40 -0000 1.2 --- runner.rb 20 Jan 2006 13:04:28 -0000 1.3 *************** *** 7,10 **** $:.unshift(bioruby_libpath) unless $:.include?(bioruby_libpath) ! exit Test::Unit::AutoRunner.run(false, File.dirname($0)) --- 7,14 ---- $:.unshift(bioruby_libpath) unless $:.include?(bioruby_libpath) ! if RUBY_VERSION > "1.8.2" ! exit Test::Unit::AutoRunner.run(true, File.dirname($0)) ! else ! exit Test::Unit::AutoRunner.run(false, File.dirname($0)) ! end From ngoto at pub.open-bio.org Fri Jan 20 08:54:10 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Fri Jan 20 08:43:25 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb chain.rb, 1.5, 1.6 model.rb, 1.6, 1.7 pdb.rb, 1.12, 1.13 residue.rb, 1.9, 1.10 Message-ID: <200601201354.k0KDsAVL026476@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv26464 Modified Files: chain.rb model.rb pdb.rb residue.rb Log Message: Added Bio::PDB::Record::ATOM#to_s and modified Bio::PDB::*.to_s. Note that Bio::PDB#to_s and Bio::PDB::{Model,Chain,Residue,Heterogen}#to_s is still imcomplete. Note that Bio::PDB::Record::ATOM#to_s (and Bio::PDB::Record::HETATM#to_s) may return invalid data when giving inordinary data. Index: residue.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/residue.rb,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** residue.rb 9 Jan 2006 11:22:36 -0000 1.9 --- residue.rb 20 Jan 2006 13:54:08 -0000 1.10 *************** *** 143,149 **** # Stringifies each atom def to_s ! string = "" ! @atoms.each{ |atom| string << atom.to_s << "\n" } ! return string end --- 143,147 ---- # Stringifies each atom def to_s ! @atoms.join('') end Index: model.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/model.rb,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** model.rb 9 Jan 2006 11:22:36 -0000 1.6 --- model.rb 20 Jan 2006 13:54:08 -0000 1.7 *************** *** 140,144 **** string = "" if model_serial ! string = "MODEL #{model_serial}" #Should use proper formatting end @chains.each{ |chain| string << chain.to_s } --- 140,144 ---- string = "" if model_serial ! string = "MODEL #{model_serial}\n" #Should use proper formatting end @chains.each{ |chain| string << chain.to_s } *************** *** 147,151 **** #end if model_serial ! string << "ENDMDL" end return string --- 147,151 ---- #end if model_serial ! string << "ENDMDL\n" end return string Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** pdb.rb 8 Jan 2006 12:59:04 -0000 1.12 --- pdb.rb 20 Jan 2006 13:54:08 -0000 1.13 *************** *** 1010,1013 **** --- 1010,1031 ---- self end + + def to_s + sprintf("%-6s%5d %-4s%-1s%3s %-1s%4d%-1s %8.3f%8.3f%8.3f%6.2f%6.2f %-4s%2s%-2s\n", + self.record_name, + self.serial, + self.name, + self.altLoc, + self.resName, + self.chainID, + self.resSeq, + self.iCode, + self.x, self.y, self.z, + self.occupancy, + self.tempFactor, + self.segID, + self.element, + self.charge) + end end #class ATOM *************** *** 1577,1581 **** string = "" @models.each{ |model| string << model.to_s } ! string << "END" return string end --- 1595,1599 ---- string = "" @models.each{ |model| string << model.to_s } ! string << "END\n" return string end Index: chain.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/chain.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** chain.rb 9 Jan 2006 11:22:36 -0000 1.5 --- chain.rb 20 Jan 2006 13:54:08 -0000 1.6 *************** *** 186,190 **** # Stringifies each residue def to_s ! @residues.join('') + "TER\n" end --- 186,190 ---- # Stringifies each residue def to_s ! @residues.join('') + "TER\n" + @heterogens.join('') end From k at pub.open-bio.org Fri Jan 20 09:04:49 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Fri Jan 20 08:54:05 2006 Subject: [BioRuby-cvs] bioruby ChangeLog,1.47,1.48 Message-ID: <200601201404.k0KE4nVL026655@pub.open-bio.org> Update of /home/repository/bioruby/bioruby In directory pub.open-bio.org:/tmp/cvs-serv26651 Modified Files: ChangeLog Log Message: * removed trivial changes (tests) Index: ChangeLog =================================================================== RCS file: /home/repository/bioruby/bioruby/ChangeLog,v retrieving revision 1.47 retrieving revision 1.48 diff -C2 -d -r1.47 -r1.48 *** ChangeLog 20 Jan 2006 09:57:08 -0000 1.47 --- ChangeLog 20 Jan 2006 14:04:47 -0000 1.48 *************** *** 3,8 **** * BioRuby 0.7.1 is released. - * test/unit/bio/db/test_rebase.rb: fixed to run w/o errors. - 2005-01-12 Toshiaki Katayama --- 3,6 ---- *************** *** 14,24 **** * Bio::PDB is refactored. See doc/Changes-0.7 for more details. - 2005-12-28 Toshiaki Katayama - - * test/unit/bio/util/test_sirna.rb: fixed to run w/o errors. - 2005-12-19 Toshiaki Katayama * BioRuby 0.7.0 is released. 2005-12-19 Naohisa Goto --- 12,20 ---- * Bio::PDB is refactored. See doc/Changes-0.7 for more details. 2005-12-19 Toshiaki Katayama * BioRuby 0.7.0 is released. + + See doc/Changes-0.7.rd file for major and incompatible changes. 2005-12-19 Naohisa Goto From k at pub.open-bio.org Sun Jan 22 23:07:06 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sun Jan 22 22:56:14 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence - New directory Message-ID: <200601230407.k0N476VL016683@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory pub.open-bio.org:/tmp/cvs-serv16679/sequence Log Message: Directory /home/repository/bioruby/bioruby/lib/bio/sequence added to the repository From k at pub.open-bio.org Sun Jan 22 23:13:38 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sun Jan 22 23:02:38 2006 Subject: [BioRuby-cvs] bioruby/lib/bio sequence.rb,0.50,0.51 Message-ID: <200601230413.k0N4DcVL016737@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv16731 Modified Files: sequence.rb Log Message: * refactored to store annotations in Bio::Sequence class * common methods are separated into Bio::Sequence::Common module * Bio::Sequence no longer inherits String * Bio::Sequence::NA and AA inherits String and include Bio::Sequence::Common * lib/bio/sequence.rb is a container for rich sequence * lib/bio/sequence/common.rb contains Bio::Sequence::Common module * lib/bio/sequence/na.rb defines Bio::Sequence::NA class * lib/bio/sequence/aa.rb defines Bio::Sequence::AA class * lib/bio/sequence/format.rb is for sequence format converter (define output formats) * lib/bio/sequence/compat.rb is just for backward compatibility Index: sequence.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v retrieving revision 0.50 retrieving revision 0.51 diff -C2 -d -r0.50 -r0.51 *** sequence.rb 20 Jan 2006 09:58:31 -0000 0.50 --- sequence.rb 23 Jan 2006 04:13:36 -0000 0.51 *************** *** 2,6 **** # = bio/sequence.rb - biological sequence class # ! # Copyright:: Copyright (C) 2000-2005 # Toshiaki Katayama , # Yoshinori K. Okuji , --- 2,6 ---- # = bio/sequence.rb - biological sequence class # ! # Copyright:: Copyright (C) 2000-2006 # Toshiaki Katayama , # Yoshinori K. Okuji , *************** *** 11,19 **** # #-- - # *TODO* remove this functionality? - # You can use Bio::Seq instead of Bio::Sequence for short. - #++ - # - #-- # # This library is free software; you can redistribute it and/or --- 11,14 ---- *************** *** 39,64 **** require 'bio/location' module Bio # Nucleic/Amino Acid sequence ! class Sequence < String def self.auto(str) moltype = self.guess(str) if moltype == NA ! NA.new(str) else ! AA.new(str) end end def guess(threshold = 0.9) ! cmp = self.composition bases = cmp['A'] + cmp['T'] + cmp['G'] + cmp['C'] + cmp['a'] + cmp['t'] + cmp['g'] + cmp['c'] ! total = self.length - cmp['N'] - cmp['n'] if bases.to_f / total > threshold --- 34,89 ---- require 'bio/location' + require 'bio/sequence/common' + require 'bio/sequence/na' + require 'bio/sequence/aa' + require 'bio/sequence/format' + require 'bio/sequence/compat' + module Bio # Nucleic/Amino Acid sequence ! class Sequence ! ! attr_accessor :entry_id, :definition, :features, :references, :comments, ! :date, :keywords, :dblinks, :taxonomy, :moltype, :seq ! ! # def method_missing(*arg) ! # @seq.send(*arg) ! # end ! ! def output(style) ! case style ! when :fasta ! format_fasta ! when :genbank ! format_genbank ! when :embl ! format_embl ! end ! end ! ! def initialize(str) ! @seq = str ! end def self.auto(str) moltype = self.guess(str) if moltype == NA ! @seq = NA.new(str) else ! @seq = AA.new(str) end + + return @seq end def guess(threshold = 0.9) ! cmp = @seq.composition bases = cmp['A'] + cmp['T'] + cmp['G'] + cmp['C'] + cmp['a'] + cmp['t'] + cmp['g'] + cmp['c'] ! total = @seq.length - cmp['N'] - cmp['n'] if bases.to_f / total > threshold *************** *** 73,457 **** end - def to_s - String.new(self) - end - alias to_str to_s - - # Force self to re-initialize for clean up (remove white spaces, - # case unification). - def seq - self.class.new(self) - end - - # Similar to the 'seq' method, but changes the self object destructively. - def normalize! - initialize(self) - self - end - alias seq! normalize! - - def <<(*arg) - super(self.class.new(*arg)) - end - alias concat << - - def +(*arg) - self.class.new(super(*arg)) - end - - # Returns the subsequence of the self string. - def subseq(s = 1, e = self.length) - return nil if s < 1 or e < 1 - s -= 1 - e -= 1 - self[s..e] - end - - # Output the FASTA format string of the sequence. The 1st argument is - # used as the comment string. If the 2nd option is given, the output - # sequence will be folded. - def to_fasta(header = '', width = nil) - ">#{header}\n" + - if width - self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") - else - self.to_s + "\n" - end - end - - # This method iterates on sub string with specified length 'window_size'. - # By specifing 'step_size', codon sized shifting or spliting genome - # sequence with ovelapping each end can easily be yielded. - # - # The remainder sequence at the terminal end will be returned. - # - # Example: - # # prints average GC% on each 100bp - # seq.window_search(100) do |subseq| - # puts subseq.gc - # end - # # prints every translated peptide (length 5aa) in the same frame - # seq.window_search(15, 3) do |subseq| - # puts subseq.translate - # end - # # split genome sequence by 10000bp with 1000bp overlap in fasta format - # i = 1 - # remainder = seq.window_search(10000, 9000) do |subseq| - # puts subseq.to_fasta("segment #{i}", 60) - # i += 1 - # end - # puts remainder.to_fasta("segment #{i}", 60) - # - def window_search(window_size, step_size = 1) - i = 0 - 0.step(self.length - window_size, step_size) do |i| - yield self[i, window_size] - end - return self[i + window_size .. -1] - end - - # This method receive a hash of residues/bases to the particular values, - # and sum up the value along with the self sequence. Especially useful - # to use with the window_search method and amino acid indices etc. - def total(hash) - hash.default = 0.0 unless hash.default - sum = 0.0 - self.each_byte do |x| - begin - sum += hash[x.chr] - end - end - return sum - end - - # Returns a hash of the occurrence counts for each residue or base. - def composition - count = Hash.new(0) - self.scan(/./) do |x| - count[x] += 1 - end - return count - end - - # Returns a randomized sequence keeping its composition by default. - # The argument is required when generating a random sequence from the empty - # sequence (used by the class methods NA.randomize, AA.randomize). - # If the block is given, yields for each random residue/base. - def randomize(hash = nil) - length = self.length - if hash - count = hash.clone - count.each_value {|x| length += x} - else - count = self.composition - end - - seq = '' - tmp = {} - length.times do - count.each do |k, v| - tmp[k] = v * rand - end - max = tmp.max {|a, b| a[1] <=> b[1]} - count[max.first] -= 1 - - if block_given? - yield max.first - else - seq += max.first - end - end - return self.class.new(seq) - end - - # Generate a new random sequence with the given frequency of bases - # or residues. The sequence length is determined by the sum of each - # base/residue occurences. - def self.randomize(*arg, &block) - self.new('').randomize(*arg, &block) - end - - # Receive a GenBank style position string and convert it to the Locations - # objects to splice the sequence itself. See also: bio/location.rb - # - # This method depends on Locations class, see bio/location.rb - def splicing(position) - unless position.is_a?(Locations) then - position = Locations.new(position) - end - s = '' - position.each do |location| - if location.sequence - s << location.sequence - else - exon = self.subseq(location.from, location.to) - begin - exon.complement! if location.strand < 0 - rescue NameError - end - s << exon - end - end - return self.class.new(s) - end - - - # Nucleic Acid sequence - - class NA < Sequence - - # Generate a nucleic acid sequence object from a string. - def initialize(str) - super - self.downcase! - self.tr!(" \t\n\r",'') - end - - # This method depends on Locations class, see bio/location.rb - def splicing(position) - mRNA = super - if mRNA.rna? - mRNA.tr!('t', 'u') - else - mRNA.tr!('u', 't') - end - mRNA - end - - # Returns complement sequence without reversing ("atgc" -> "tacg") - def forward_complement - s = self.class.new(self) - s.forward_complement! - s - end - - # Convert to complement sequence without reversing ("atgc" -> "tacg") - def forward_complement! - if self.rna? - self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') - else - self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') - end - self - end - - # Returns reverse complement sequence ("atgc" -> "gcat") - def reverse_complement - s = self.class.new(self) - s.reverse_complement! - s - end - - # Convert to reverse complement sequence ("atgc" -> "gcat") - def reverse_complement! - self.reverse! - self.forward_complement! - end - - # Aliases for short - alias complement reverse_complement - alias complement! reverse_complement! - - - # Translate into the amino acid sequence from the given frame and the - # selected codon table. The table also can be a Bio::CodonTable object. - # The 'unknown' character is used for invalid/unknown codon (can be - # used for 'nnn' and/or gap translation in practice). - # - # Frame can be 1, 2 or 3 for the forward strand and -1, -2 or -3 - # (4, 5 or 6 is also accepted) for the reverse strand. - def translate(frame = 1, table = 1, unknown = 'X') - if table.is_a?(Bio::CodonTable) - ct = table - else - ct = Bio::CodonTable[table] - end - naseq = self.dna - case frame - when 1, 2, 3 - from = frame - 1 - when 4, 5, 6 - from = frame - 4 - naseq.complement! - when -1, -2, -3 - from = -1 - frame - naseq.complement! - else - from = 0 - end - nalen = naseq.length - from - nalen -= nalen % 3 - aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} - return Bio::Sequence::AA.new(aaseq) - end - - # Returns counts of the each codon in the sequence by Hash. - def codon_usage - hash = Hash.new(0) - self.window_search(3, 3) do |codon| - hash[codon] += 1 - end - return hash - end - - # Calculate the ratio of GC / ATGC bases in percent. - def gc_percent - count = self.composition - at = count['a'] + count['t'] + count['u'] - gc = count['g'] + count['c'] - gc = 100 * gc / (at + gc) - return gc - end - - # Show abnormal bases other than 'atgcu'. - def illegal_bases - self.scan(/[^atgcu]/).sort.uniq - end - - # Estimate the weight of this biological string molecule. - # NucleicAcid is defined in bio/data/na.rb - def molecular_weight - if self.rna? - NucleicAcid.weight(self, true) - else - NucleicAcid.weight(self) - end - end - - # Convert the universal code string into the regular expression. - def to_re - if self.rna? - NucleicAcid.to_re(self.dna, true) - else - NucleicAcid.to_re(self) - end - end - - # Convert the self string into the list of the names of the each base. - def names - array = [] - self.each_byte do |x| - array.push(NucleicAcid.names[x.chr.upcase]) - end - return array - end - - # Output a DNA string by substituting 'u' to 't'. - def dna - self.tr('u', 't') - end - - def dna! - self.tr!('u', 't') - end - - # Output a RNA string by substituting 't' to 'u'. - def rna - self.tr('t', 'u') - end - - def rna! - self.tr!('t', 'u') - end - - def rna? - self.index('u') - end - protected :rna? - - def pikachu - self.dna.tr("atgc", "pika") # joke, of course :-) - end - - end - - - # Amino Acid sequence - - class AA < Sequence - - # Generate a amino acid sequence object from a string. - def initialize(str) - super - self.upcase! - self.tr!(" \t\n\r",'') - end - - # Estimate the weight of this protein. - # AminoAcid is defined in bio/data/aa.rb - def molecular_weight - AminoAcid.weight(self) - end - - def to_re - AminoAcid.to_re(self) - end - - # Generate the list of the names of the each residue along with the - # sequence (3 letters code). - def codes - array = [] - self.each_byte do |x| - array.push(AminoAcid.names[x.chr]) - end - return array - end - - # Similar to codes but returns long names. - def names - self.codes.map do |x| - AminoAcid.names[x] - end - end - - end - end # Sequence - - - class Seq < Sequence - attr_accessor :entry_id, :definition, :features, :references, :comments, - :date, :keywords, :dblinks, :taxonomy, :moltype - end --- 98,102 ---- From k at pub.open-bio.org Sun Jan 22 23:13:38 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sun Jan 22 23:03:03 2006 Subject: [BioRuby-cvs] bioruby/lib/bio/sequence aa.rb, NONE, 1.1 common.rb, NONE, 1.1 compat.rb, NONE, 1.1 format.rb, NONE, 1.1 na.rb, NONE, 1.1 Message-ID: <200601230413.k0N4DcVL016741@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/sequence In directory pub.open-bio.org:/tmp/cvs-serv16731/sequence Added Files: aa.rb common.rb compat.rb format.rb na.rb Log Message: * refactored to store annotations in Bio::Sequence class * common methods are separated into Bio::Sequence::Common module * Bio::Sequence no longer inherits String * Bio::Sequence::NA and AA inherits String and include Bio::Sequence::Common * lib/bio/sequence.rb is a container for rich sequence * lib/bio/sequence/common.rb contains Bio::Sequence::Common module * lib/bio/sequence/na.rb defines Bio::Sequence::NA class * lib/bio/sequence/aa.rb defines Bio::Sequence::AA class * lib/bio/sequence/format.rb is for sequence format converter (define output formats) * lib/bio/sequence/compat.rb is just for backward compatibility --- NEW FILE: compat.rb --- # only for backward compatibility, use Bio::Sequence#output(:fasta) instead module Bio class Sequence def to_s String.new(@seq) end alias to_str to_s module Common # Output the FASTA format string of the sequence. The 1st argument is # used as the comment string. If the 2nd option is given, the output # sequence will be folded. def to_fasta(header = '', width = nil) ">#{header}\n" + if width self.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else self.to_s + "\n" end end end # Common class NA def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end end # NA class AA def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end end # AA end # Sequence end # Bio --- NEW FILE: na.rb --- module Bio class Sequence # Nucleic Acid sequence class NA < String include Bio::Sequence::Common # Generate a nucleic acid sequence object from a string. def initialize(str) super self.downcase! self.tr!(" \t\n\r",'') end # This method depends on Locations class, see bio/location.rb def splicing(position) mRNA = super if mRNA.rna? mRNA.tr!('t', 'u') else mRNA.tr!('u', 't') end mRNA end # Returns complement sequence without reversing ("atgc" -> "tacg") def forward_complement s = self.class.new(self) s.forward_complement! s end # Convert to complement sequence without reversing ("atgc" -> "tacg") def forward_complement! if self.rna? self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') else self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') end self end # Returns reverse complement sequence ("atgc" -> "gcat") def reverse_complement s = self.class.new(self) s.reverse_complement! s end # Convert to reverse complement sequence ("atgc" -> "gcat") def reverse_complement! self.reverse! self.forward_complement! end # Aliases for short alias complement reverse_complement alias complement! reverse_complement! # Translate into the amino acid sequence from the given frame and the # selected codon table. The table also can be a Bio::CodonTable object. # The 'unknown' character is used for invalid/unknown codon (can be # used for 'nnn' and/or gap translation in practice). # # Frame can be 1, 2 or 3 for the forward strand and -1, -2 or -3 # (4, 5 or 6 is also accepted) for the reverse strand. def translate(frame = 1, table = 1, unknown = 'X') if table.is_a?(Bio::CodonTable) ct = table else ct = Bio::CodonTable[table] end naseq = self.dna case frame when 1, 2, 3 from = frame - 1 when 4, 5, 6 from = frame - 4 naseq.complement! when -1, -2, -3 from = -1 - frame naseq.complement! else from = 0 end nalen = naseq.length - from nalen -= nalen % 3 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} return Bio::Sequence::AA.new(aaseq) end # Returns counts of the each codon in the sequence by Hash. def codon_usage hash = Hash.new(0) self.window_search(3, 3) do |codon| hash[codon] += 1 end return hash end # Calculate the ratio of GC / ATGC bases in percent. def gc_percent count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] gc = 100 * gc / (at + gc) return gc end # Show abnormal bases other than 'atgcu'. def illegal_bases self.scan(/[^atgcu]/).sort.uniq end # Estimate the weight of this biological string molecule. # NucleicAcid is defined in bio/data/na.rb def molecular_weight if self.rna? NucleicAcid.weight(self, true) else NucleicAcid.weight(self) end end # Convert the universal code string into the regular expression. def to_re if self.rna? NucleicAcid.to_re(self.dna, true) else NucleicAcid.to_re(self) end end # Convert the self string into the list of the names of the each base. def names array = [] self.each_byte do |x| array.push(NucleicAcid.names[x.chr.upcase]) end return array end # Output a DNA string by substituting 'u' to 't'. def dna self.tr('u', 't') end def dna! self.tr!('u', 't') end # Output a RNA string by substituting 't' to 'u'. def rna self.tr('t', 'u') end def rna! self.tr!('t', 'u') end def rna? self.index('u') end protected :rna? def pikachu self.dna.tr("atgc", "pika") # joke, of course :-) end end # NA end # Sequence end # Bio --- NEW FILE: format.rb --- # porting from N. Goto's feature-output.rb on BioRuby list. module Bio class Sequence # Output the FASTA format string of the sequence. The 1st argument is # used in the comment line. If the 2nd argument (integer) is given, # the output sequence will be folded. def format_fasta(header = nil, width = nil) header ||= "#{@entry_id} #{@definition}" ">#{header}\n" + if width @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n") else @seq.to_s + "\n" end end def format_genbank prefix = ' ' * 5 indent = prefix + ' ' * 16 fwidth = 79 - indent.length format_features(prefix, indent, fwidth) end def format_embl prefix = 'FT ' indent = prefix + ' ' * 16 fwidth = 80 - indent.length format_features(prefix, indent, fwidth) end private def format_features(prefix, indent, width) result = '' @features.each do |feature| result << prefix + sprintf("%-16s", feature.feature) position = feature.position #position = feature.locations.to_s head = '' wrap(position, width).each_line do |line| result << head << line head = indent end result << format_qualifiers(feature.qualifiers, width) end return result end def format_qualifiers(qualifiers, indent, width) qualifiers.each do |qualifier| q = qualifier.qualifier v = qualifier.value.to_s if v == true lines = wrap('/' + q, width) elsif q == 'translation' lines = fold('/' + q + '=' + val, width) else if v[/\D/] #v.delete!("\x00-\x1f\x7f-\xff") v.gsub!(/"/, '""') v = '"' + v + '"' end lines = wrap('/' + q + '=' + val, width) end return lines.gsub(/^/, indent) end end def fold(str, width) str.gsub(Regexp.new("(.{1,#{width}})"), "\\1\n") end def wrap(str, width) result = [] left = str.dup while left and left.length > width line = nil width.downto(1) do |i| if left[i..i] == ' ' or /[,;]/ =~ left[(i-1)..(i-1)] then line = left[0..(i-1)].sub(/ +\z/, '') left = left[i..-1].sub(/\A +/, '') break end end if line.nil? then line = left[0..(width-1)] left = left[width..-1] end result << line end result << left if left return result.join("\n") end end # Sequence end # Bio --- NEW FILE: common.rb --- module Bio class Sequence module Common def to_s String.new(self) end alias to_str to_s # Force self to re-initialize for clean up (remove white spaces, # case unification). def seq self.class.new(self) end # Similar to the 'seq' method, but changes the self object destructively. def normalize! initialize(self) self end alias seq! normalize! def <<(*arg) super(self.class.new(*arg)) end alias concat << def +(*arg) self.class.new(super(*arg)) end # Returns the subsequence of the self string. def subseq(s = 1, e = self.length) return nil if s < 1 or e < 1 s -= 1 e -= 1 self[s..e] end # This method iterates on sub string with specified length 'window_size'. # By specifing 'step_size', codon sized shifting or spliting genome # sequence with ovelapping each end can easily be yielded. # # The remainder sequence at the terminal end will be returned. # # Example: # # prints average GC% on each 100bp # seq.window_search(100) do |subseq| # puts subseq.gc # end # # prints every translated peptide (length 5aa) in the same frame # seq.window_search(15, 3) do |subseq| # puts subseq.translate # end # # split genome sequence by 10000bp with 1000bp overlap in fasta format # i = 1 # remainder = seq.window_search(10000, 9000) do |subseq| # puts subseq.to_fasta("segment #{i}", 60) # i += 1 # end # puts remainder.to_fasta("segment #{i}", 60) # def window_search(window_size, step_size = 1) i = 0 0.step(self.length - window_size, step_size) do |i| yield self[i, window_size] end return self[i + window_size .. -1] end # This method receive a hash of residues/bases to the particular values, # and sum up the value along with the self sequence. Especially useful # to use with the window_search method and amino acid indices etc. def total(hash) hash.default = 0.0 unless hash.default sum = 0.0 self.each_byte do |x| begin sum += hash[x.chr] end end return sum end # Returns a hash of the occurrence counts for each residue or base. def composition count = Hash.new(0) self.scan(/./) do |x| count[x] += 1 end return count end # Returns a randomized sequence keeping its composition by default. # The argument is required when generating a random sequence from the empty # sequence (used by the class methods NA.randomize, AA.randomize). # If the block is given, yields for each random residue/base. def randomize(hash = nil) length = self.length if hash count = hash.clone count.each_value {|x| length += x} else count = self.composition end seq = '' tmp = {} length.times do count.each do |k, v| tmp[k] = v * rand end max = tmp.max {|a, b| a[1] <=> b[1]} count[max.first] -= 1 if block_given? yield max.first else seq += max.first end end return self.class.new(seq) end # Generate a new random sequence with the given frequency of bases # or residues. The sequence length is determined by the sum of each # base/residue occurences. def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end # Receive a GenBank style position string and convert it to the Locations # objects to splice the sequence itself. See also: bio/location.rb # # This method depends on Locations class, see bio/location.rb def splicing(position) unless position.is_a?(Locations) then position = Locations.new(position) end s = '' position.each do |location| if location.sequence s << location.sequence else exon = self.subseq(location.from, location.to) begin exon.complement! if location.strand < 0 rescue NameError end s << exon end end return self.class.new(s) end end # Common end # Sequence end # Bio --- NEW FILE: aa.rb --- module Bio class Sequence # Amino Acid sequence class AA < String include Bio::Sequence::Common # Generate a amino acid sequence object from a string. def initialize(str) super self.upcase! self.tr!(" \t\n\r",'') end # Estimate the weight of this protein. # AminoAcid is defined in bio/data/aa.rb def molecular_weight AminoAcid.weight(self) end def to_re AminoAcid.to_re(self) end # Generate the list of the names of the each residue along with the # sequence (3 letters code). def codes array = [] self.each_byte do |x| array.push(AminoAcid.names[x.chr]) end return array end # Similar to codes but returns long names. def names self.codes.map do |x| AminoAcid.names[x] end end end # AA end # Sequence end # Bio From k at pub.open-bio.org Tue Jan 24 06:42:19 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Tue Jan 24 06:31:45 2006 Subject: [BioRuby-cvs] bioruby gemspec.rb,1.5,1.6 Message-ID: <200601241142.k0OBgJVL022553@pub.open-bio.org> Update of /home/repository/bioruby/bioruby In directory pub.open-bio.org:/tmp/cvs-serv22549 Modified Files: gemspec.rb Log Message: * improved to install executables Index: gemspec.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/gemspec.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** gemspec.rb 20 Jan 2006 09:57:08 -0000 1.5 --- gemspec.rb 24 Jan 2006 11:42:17 -0000 1.6 *************** *** 5,17 **** s.name = 'bio' s.version = "0.7.1" s.author = "BioRuby project" s.email = "staff@bioruby.org" s.homepage = "http://bioruby.org/" s.platform = Gem::Platform::RUBY - s.summary = "BioRuby is a library for bioinformatics (biology + information science)." s.files = Dir.glob("{bin,doc,etc,lib,sample,test}/**/*").delete_if {|item| item.include?("CVS") || item.include?("rdoc")} ! s.files.concat ["ChangeLog"] s.require_path = 'lib' s.autorequire = 'bio' end --- 5,29 ---- s.name = 'bio' s.version = "0.7.1" + s.author = "BioRuby project" s.email = "staff@bioruby.org" s.homepage = "http://bioruby.org/" + s.rubyforge_project = "bioruby" + s.summary = "Bioinformatics library" + s.description = "BioRuby is a library for bioinformatics (biology + information science)." + s.platform = Gem::Platform::RUBY s.files = Dir.glob("{bin,doc,etc,lib,sample,test}/**/*").delete_if {|item| item.include?("CVS") || item.include?("rdoc")} ! s.files.concat ["README", "README.DEV", "ChangeLog"] ! ! # s.rdoc_options << '--exclude' << '.' ! # s.has_rdoc = false ! s.require_path = 'lib' s.autorequire = 'bio' + + s.bindir = "bin" + s.executables = ["bioruby", "br_biofetch.rb", "br_biogetseq.rb", "br_bioflat.rb", "br_pmfetch.rb"] + s.default_executable = "bioruby" end From ngoto at pub.open-bio.org Tue Jan 24 09:11:37 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Tue Jan 24 09:00:37 2006 Subject: [BioRuby-cvs] bioruby/test/unit/bio test_alignment.rb,1.6,1.7 Message-ID: <200601241411.k0OEBbVL022924@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio In directory pub.open-bio.org:/tmp/cvs-serv22914 Modified Files: test_alignment.rb Log Message: changed Bio::Sequence to Bio::Sequence::NA due to the change of Bio::Sequence. Index: test_alignment.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/test/unit/bio/test_alignment.rb,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** test_alignment.rb 2 Dec 2005 13:01:49 -0000 1.6 --- test_alignment.rb 24 Jan 2006 14:11:34 -0000 1.7 *************** *** 520,529 **** def test_alignment_subseq ! a = A[ Sequence.new('a'), Sequence.new('at'), Sequence.new('atgca'), ! Sequence.new('atg'), Sequence.new('') ] ! assert_equal(Alignment::SequenceArray[ Sequence.new(''), ! Sequence.new('t'), Sequence.new('tgc'), ! Sequence.new('tg'), nil ], ! a.alignment_subseq(2,4)) end --- 520,537 ---- def test_alignment_subseq ! a = A[ ! Sequence::NA.new('a'), ! Sequence::NA.new('at'), ! Sequence::NA.new('atgca'), ! Sequence::NA.new('atg'), ! Sequence::NA.new('') ! ] ! assert_equal(Alignment::SequenceArray[ ! Sequence::NA.new(''), ! Sequence::NA.new('t'), ! Sequence::NA.new('tgc'), ! Sequence::NA.new('tg'), ! nil ! ], a.alignment_subseq(2,4)) end From ngoto at pub.open-bio.org Tue Jan 24 09:17:01 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Tue Jan 24 09:05:51 2006 Subject: [BioRuby-cvs] bioruby/lib/bio alignment.rb,1.14,1.15 Message-ID: <200601241417.k0OEH1VL022974@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv22964 Modified Files: alignment.rb Log Message: changed Bio::Sequence to Bio::Sequence::NA or Bio::Sequence::AA due to the change of Bio::Sequence. Index: alignment.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/alignment.rb,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** alignment.rb 2 Dec 2005 12:01:28 -0000 1.14 --- alignment.rb 24 Jan 2006 14:16:59 -0000 1.15 *************** *** 818,827 **** alias slice alignment_slice ! # For each sequence, the 'subseq' method (Bio::Seqeunce#subseq is # expected) is executed, and returns a new alignment as # a Bio::Alignment::SequenceArray object. # # All sequences in the alignment are expected to be kind of ! # Bio::Sequence objects. # # Unlike alignment_window method, the result alignment --- 818,827 ---- alias slice alignment_slice ! # For each sequence, the 'subseq' method (Bio::Seqeunce::Common#subseq is # expected) is executed, and returns a new alignment as # a Bio::Alignment::SequenceArray object. # # All sequences in the alignment are expected to be kind of ! # Bio::Sequence::NA or Bio::Sequence::AA objects. # # Unlike alignment_window method, the result alignment *************** *** 1178,1182 **** def extract_seq(obj) seq = nil ! if obj.is_a?(Bio::Sequence) then seq = obj else --- 1178,1182 ---- def extract_seq(obj) seq = nil ! if obj.is_a?(Bio::Sequence::NA) or obj.is_a?(Bio::Sequence::AA) then seq = obj else *************** *** 1603,1607 **** def add_seq(seq, key = nil) #(BioPerl) AlignI::add_seq like method ! unless seq.is_a?(Bio::Sequence) then s = extract_seq(seq) key = extract_key(seq) unless key --- 1603,1607 ---- def add_seq(seq, key = nil) #(BioPerl) AlignI::add_seq like method ! unless seq.is_a?(Bio::Sequence::NA) or seq.is_a?(Bio::Sequence::AA) s = extract_seq(seq) key = extract_key(seq) unless key From ngoto at pub.open-bio.org Thu Jan 26 11:04:06 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu, 26 Jan 2006 16:04:06 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/pdb - New directory Message-ID: <200601261604.k0QG46VL031055@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv31045/pdb Log Message: Directory /home/repository/bioruby/bioruby/test/unit/bio/db/pdb added to the repository From ngoto at pub.open-bio.org Thu Jan 26 11:06:05 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu, 26 Jan 2006 16:06:05 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/pdb test_pdb.rb,NONE,1.1 Message-ID: <200601261606.k0QG65VL031084@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv31072 Added Files: test_pdb.rb Log Message: Newly added unit test of Bio::PDB::* classes. Under construction. It is still very poor. --- NEW FILE: test_pdb.rb --- # # = test/unit/bio/db/pdb/test_pdb.rb - Unit test for Bio::PDB classes # # Copyright:: Copyright (C) 2006 # Naohisa Goto # # License:: LGPL # # $Id: test_pdb.rb,v 1.1 2006/01/26 16:06:03 ngoto Exp $ # #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #++ # # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 4, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio' module Bio #class TestPDB < Test::Unit::TestCase #end #class TestPDB module TestPDBRecord # test of Bio::PDB::Record::ATOM class TestATOM < Test::Unit::TestCase def setup # the data is taken from # http://www.rcsb.org/pdb/file_formats/pdb/pdbguide2.2/part_62.html @str = 'ATOM 154 CG2BVAL A 25 29.909 16.996 55.922 0.72 13.25 A1 C ' @atom = Bio::PDB::Record::ATOM.new.initialize_from_string(@str) end def test_record_name assert_equal('ATOM', @atom.record_name) end def test_serial assert_equal(154, @atom.serial) end def test_name assert_equal(' CG2', @atom.name) end def test_altLoc assert_equal('B', @atom.altLoc) end def test_resName assert_equal('VAL', @atom.resName) end def test_chainID assert_equal('A', @atom.chainID) end def test_resSeq assert_equal(25, @atom.resSeq) end def test_iCode assert_equal(' ', @atom.iCode) end def test_x assert_in_delta(29.909, @atom.x, Float::EPSILON) end def test_y assert_in_delta(16.996, @atom.y, Float::EPSILON) end def test_z assert_in_delta(55.922, @atom.z, Float::EPSILON) end def test_occupancy assert_in_delta(0.72, @atom.occupancy, Float::EPSILON) end def test_tempFactor assert_in_delta(13.25, @atom.tempFactor, Float::EPSILON) end def test_segID assert_equal('A1 ', @atom.segID) end def test_element assert_equal(' C', @atom.element) end def test_charge assert_equal(' ', @atom.charge) end def test_xyz assert_equal(Bio::PDB::Coordinate[ "29.909".to_f, "16.996".to_f, "55.922".to_f ], @atom.xyz) end def test_to_a assert_equal([ "29.909".to_f, "16.996".to_f, "55.922".to_f ], @atom.to_a) end def test_comparable a = Bio::PDB::Record::ATOM.new a.serial = 999 assert_equal(-1, @atom <=> a) a.serial = 154 assert_equal( 0, @atom <=> a) a.serial = 111 assert_equal( 1, @atom <=> a) end def test_to_s assert_equal(@str + "\n", @atom.to_s) end def test_original_data assert_equal([ @str ], @atom.original_data) end def test_do_parse assert_equal(@atom, @atom.do_parse) end def test_residue assert_equal(nil, @atom.residue) end def test_sigatm assert_equal(nil, @atom.sigatm) end def test_anisou assert_equal(nil, @atom.anisou) end def test_ter assert_equal(nil, @atom.ter) end end #class TestATOM end #module TestPDBRecord end #module Bio From ngoto at pub.open-bio.org Fri Jan 27 23:23:44 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sat, 28 Jan 2006 04:23:44 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io flatfile.rb,1.41,1.42 Message-ID: <200601280423.k0S4NhVL004355@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv4345/io Modified Files: flatfile.rb Log Message: changed format autodetection for KEGG data (format was changed) Index: flatfile.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile.rb,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** flatfile.rb 1 Nov 2005 15:34:45 -0000 1.41 --- flatfile.rb 28 Jan 2006 04:23:41 -0000 1.42 *************** *** 413,428 **** Bio::KEGG::BRITE ! when /^ENTRY .+ KO\s*$/ Bio::KEGG::KO ! when /^ENTRY .+ Glycan\s*$/ Bio::KEGG::GLYCAN ! when /^ENTRY .+ (CDS|gene|.*RNA) / ! Bio::KEGG::GENES ! when /^ENTRY EC [0-9\.]+$/ Bio::KEGG::ENZYME ! when /^ENTRY C[A-Za-z0-9\._]+$/ Bio::KEGG::COMPOUND ! when /^ENTRY R[A-Za-z0-9\._]+$/ Bio::KEGG::REACTION when /^ENTRY [a-z]+$/ Bio::KEGG::GENOME --- 413,431 ---- Bio::KEGG::BRITE ! when /^ENTRY .+ KO\s*/ Bio::KEGG::KO ! when /^ENTRY .+ Glycan\s*/ Bio::KEGG::GLYCAN ! when /^ENTRY EC [0-9\.]+$/, ! /^ENTRY .+ Enzyme\s*/ Bio::KEGG::ENZYME ! when /^ENTRY C[A-Za-z0-9\._]+$/, ! /^ENTRY .+ Compound\s*/ Bio::KEGG::COMPOUND ! when /^ENTRY R[A-Za-z0-9\._]+$/, ! /^ENTRY .+ Reaction\s*/ Bio::KEGG::REACTION + when /^ENTRY .+ (CDS|gene|.*RNA) / + Bio::KEGG::GENES when /^ENTRY [a-z]+$/ Bio::KEGG::GENOME From nakao at pub.open-bio.org Sat Jan 28 01:40:41 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 06:40:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl common.rb, 1.8, 1.9 embl.rb, 1.25, 1.26 sptr.rb, 1.29, 1.30 swissprot.rb, 1.3, 1.4 trembl.rb, 1.3, 1.4 uniprot.rb, 1.1, 1.2 Message-ID: <200601280640.k0S6efVL004736@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory pub.open-bio.org:/tmp/cvs-serv4726/lib/bio/db/embl Modified Files: common.rb embl.rb sptr.rb swissprot.rb trembl.rb uniprot.rb Log Message: * Updated RDoc. Index: sptr.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/sptr.rb,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** sptr.rb 2 Nov 2005 07:30:14 -0000 1.29 --- sptr.rb 28 Jan 2006 06:40:38 -0000 1.30 *************** *** 7,15 **** # $Id$ # ! # == UniProtKB/SwissProt and TrEMBL # ! # See the SWISS-PROT dicument file SPECLIST.TXT. # ! # == Example # #-- --- 7,34 ---- # $Id$ # ! # == Description ! # ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes. # ! # See the SWISS-PROT document file SPECLIST.TXT or UniProtKB/SwissProt ! # user manual. ! # ! # == Examples # ! # str = File.read("p53_human.swiss") ! # obj = Bio::SPTR.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == References ! # ! # * Swiss-Prot Protein knowledgebase. TrEMBL Computer-annotated supplement ! # to Swiss-Prot ! # http://au.expasy.org/sprot/ ! # ! # * UniProt ! # http://uniprot.org/ ! # ! # * The UniProtKB/SwissProt/TrEMBL User Manual ! # http://www.expasy.org/sprot/userman.html # #-- *************** *** 37,41 **** module Bio ! # Parser class for UniProtKB/SwissProt and TrEMBL database entry class SPTR < EMBLDB include Bio::EMBLDB::Common --- 56,60 ---- module Bio ! # Parser class for UniProtKB/SwissProt and TrEMBL database entry. class SPTR < EMBLDB include Bio::EMBLDB::Common *************** *** 46,60 **** # returns a Hash of the ID line. # returns a content (Int or String) of the ID line by a given key. # Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH'] # ! # ID Line ! # "ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}." # - # ENTRY_NAME := "#{X}_#{Y}" - # X =~ /[A-Z0-9]{1,5}/ # The protein name. - # Y =~ /[A-Z0-9]{1,5}/ # The biological source of the protein. - # MOLECULE_TYPE := 'PRT' =~ /\w{3}/ - # SEQUENCE_LENGTH =~ /\d+ AA/ def id_line(key = nil) unless @data['ID'] --- 65,81 ---- # returns a Hash of the ID line. + # # returns a content (Int or String) of the ID line by a given key. # Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH'] # ! # === ID Line ! # ID P53_HUMAN STANDARD; PRT; 393 AA. ! # #"ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}." ! # ! # === Examples ! # obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD", "SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"} ! # ! # obj.id_line('ENTRY_NAME') #=> "P53_HUMAN" # def id_line(key = nil) unless @data['ID'] *************** *** 79,83 **** # returns a ENTRY_NAME in the ID line. # - # A short-cut for Bio::SPTR#id_line('ENTRY_NAME'). def entry_id id_line('ENTRY_NAME') --- 100,103 ---- *************** *** 120,127 **** # returns a String of information in the DT lines by a given key.. # ! # DT Line; date (3/entry) ! # DT DD-MMM-YYY (rel. NN, Created) ! # DT DD-MMM-YYY (rel. NN, Last sequence update) ! # DT DD-MMM-YYY (rel. NN, Last annotation update) def dt(key = nil) unless @data['DT'] --- 140,147 ---- # returns a String of information in the DT lines by a given key.. # ! # === DT Line; date (3/entry) ! # DT DD-MMM-YYY (rel. NN, Created) ! # DT DD-MMM-YYY (rel. NN, Last sequence update) ! # DT DD-MMM-YYY (rel. NN, Last annotation update) def dt(key = nil) unless @data['DT'] *************** *** 144,148 **** # returns the proposed official name of the protein. # ! # DE Line; description (>=1) # "DE #{OFFICIAL_NAME} (#{SYNONYM})" # "DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]." --- 164,168 ---- # returns the proposed official name of the protein. # ! # === DE Line; description (>=1) # "DE #{OFFICIAL_NAME} (#{SYNONYM})" # "DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]." *************** *** 193,197 **** # * Bio::SPTR#gn[0] -> Array # OR # ! # GN Line: Gene name(s) (>=0, optional) def gn return @data['GN'] if @data['GN'] --- 213,217 ---- # * Bio::SPTR#gn[0] -> Array # OR # ! # === GN Line: Gene name(s) (>=0, optional) def gn return @data['GN'] if @data['GN'] *************** *** 206,210 **** # returns contents in the old style GN line. ! # GN Line: Gene name(s) (>=0, optional) # GN HNS OR DRDX OR OSMZ OR BGLY. # GN CECA1 AND CECA2. --- 226,230 ---- # returns contents in the old style GN line. ! # === GN Line: Gene name(s) (>=0, optional) # GN HNS OR DRDX OR OSMZ OR BGLY. # GN CECA1 AND CECA2. *************** *** 293,297 **** # * Bio::EPTR#os(0) -> "Homo sapiens (Human)" # ! # OS Line; organism species (>=1) # OS Genus species (name). # OS Genus species (name0) (name1). --- 313,317 ---- # * Bio::EPTR#os(0) -> "Homo sapiens (Human)" # ! # === OS Line; organism species (>=1) # OS Genus species (name). # OS Genus species (name0) (name1). *************** *** 338,344 **** # returns a Hash of oraganism taxonomy cross-references. # * Bio::SPTR#ox -> Hash ! # {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...} # ! # OX Line; organism taxonomy cross-reference (>=1 per entry) # OX NCBI_TaxID=1234; # OX NCBI_TaxID=1234, 2345, 3456, 4567; --- 358,364 ---- # returns a Hash of oraganism taxonomy cross-references. # * Bio::SPTR#ox -> Hash ! # {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...} # ! # === OX Line; organism taxonomy cross-reference (>=1 per entry) # OX NCBI_TaxID=1234; # OX NCBI_TaxID=1234, 2345, 3456, 4567; *************** *** 369,409 **** # returns contents in the CC lines. # * Bio::SPTR#cc -> Hash ! ! # * Bio::SPTR#cc(Int) -> String ! # returns an Array of contents in the TOPIC string. # * Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash # # returns contents of the "ALTERNATIVE PRODUCTS". # * Bio::SPTR#cc('ALTERNATIVE PRODUCTS') -> Hash ! # {'Event' => str, ! # 'Named isoforms' => int, ! # 'Comment' => str, ! # 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} # ! # CC -!- ALTERNATIVE PRODUCTS: ! # CC Event=Alternative splicing; Named isoforms=15; ! # ... ! # CC placentae isoforms. All tissues differentially splice exon 13; ! # CC Name=A; Synonyms=no del; ! # CC IsoId=P15529-1; Sequence=Displayed; # # returns contents of the "DATABASE". # * Bio::SPTR#cc('DATABASE') -> Array ! # [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] # ! # CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. # # returns contents of the "MASS SPECTROMETRY". # * Bio::SPTR#cc('MASS SPECTROMETRY') -> Array ! # [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] # ! # MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. # - # CC lines (>=0, optional) - # CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT - # CC IN LIVER, KIDNEY, LUNG AND BRAIN. - # - # CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK; - # CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK. def cc(tag = nil) unless @data['CC'] --- 389,429 ---- # returns contents in the CC lines. # * Bio::SPTR#cc -> Hash ! # ! # returns an object of contents in the TOPIC. # * Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash # # returns contents of the "ALTERNATIVE PRODUCTS". # * Bio::SPTR#cc('ALTERNATIVE PRODUCTS') -> Hash ! # {'Event' => str, ! # 'Named isoforms' => int, ! # 'Comment' => str, ! # 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} # ! # CC -!- ALTERNATIVE PRODUCTS: ! # CC Event=Alternative splicing; Named isoforms=15; ! # ... ! # CC placentae isoforms. All tissues differentially splice exon 13; ! # CC Name=A; Synonyms=no del; ! # CC IsoId=P15529-1; Sequence=Displayed; # # returns contents of the "DATABASE". # * Bio::SPTR#cc('DATABASE') -> Array ! # [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] # ! # CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. # # returns contents of the "MASS SPECTROMETRY". # * Bio::SPTR#cc('MASS SPECTROMETRY') -> Array ! # [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] # ! # CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. ! # ! # === CC lines (>=0, optional) ! # CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT ! # CC IN LIVER, KIDNEY, LUNG AND BRAIN. ! # ! # CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK; ! # CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK. # def cc(tag = nil) unless @data['CC'] *************** *** 542,546 **** # returns conteins in a line of the CC INTERACTION section. # ! # CC P46527:CDKN1B; NbExp=1; IntAct=EBI-359815, EBI-519280; def cc_interaction_parse(str) it = str.scan(/(.+?); NbExp=(.+?); IntAct=(.+?);/) --- 562,566 ---- # returns conteins in a line of the CC INTERACTION section. # ! # CC P46527:CDKN1B; NbExp=1; IntAct=EBI-359815, EBI-519280; def cc_interaction_parse(str) it = str.scan(/(.+?); NbExp=(.+?); IntAct=(.+?);/) *************** *** 556,562 **** # * Bio::EMBLDB#dr -> Hash w/in Array # ! # DR Line; defabases cross-reference (>=0) ! # a cross_ref pre one line ! # DR database_identifier; primary_identifier; secondary_identifier. @@dr_database_identifier = ['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', --- 576,582 ---- # * Bio::EMBLDB#dr -> Hash w/in Array # ! # === DR Line; defabases cross-reference (>=0) ! # DR database_identifier; primary_identifier; secondary_identifier. ! # a cross_ref pre one line @@dr_database_identifier = ['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', *************** *** 575,595 **** # returns conteins in the feature table. # * Bio::SPTR#ft -> Hash ! # {'feature_name' => [{'From' => str, 'To' => str, ! # 'Description' => str, 'FTId' => str}],...} # # returns an Array of the information about the feature_name in the feature table. # * Bio::SPTR#ft(feature_name) -> Array of Hash ! # [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...] # ! # FT Line; feature table data (>=0, optional) # ! # Col Data item ! # ----- ----------------- ! # 1- 2 FT ! # 6-13 Feature name ! # 15-20 `FROM' endpoint ! # 22-27 `TO' endpoint ! # 35-75 Description (>=0 per key) ! # ----- ----------------- def ft(feature_name = nil) unless @data['FT'] --- 595,615 ---- # returns conteins in the feature table. # * Bio::SPTR#ft -> Hash ! # {'feature_name' => [{'From' => str, 'To' => str, ! # 'Description' => str, 'FTId' => str}],...} # # returns an Array of the information about the feature_name in the feature table. # * Bio::SPTR#ft(feature_name) -> Array of Hash ! # [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...] # ! # == FT Line; feature table data (>=0, optional) # ! # Col Data item ! # ----- ----------------- ! # 1- 2 FT ! # 6-13 Feature name ! # 15-20 `FROM' endpoint ! # 22-27 `TO' endpoint ! # 35-75 Description (>=0 per key) ! # ----- ----------------- def ft(feature_name = nil) unless @data['FT'] *************** *** 693,699 **** # * Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length', 'CRC64'] # ! # SQ Line; sequence header (1/entry) ! # SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64; ! # SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64; # # MW, Dalton unit. --- 713,719 ---- # * Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length', 'CRC64'] # ! # === SQ Line; sequence header (1/entry) ! # SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64; ! # SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64; # # MW, Dalton unit. Index: uniprot.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/uniprot.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** uniprot.rb 10 Sep 2005 23:43:35 -0000 1.1 --- uniprot.rb 28 Jan 2006 06:40:39 -0000 1.2 *************** *** 1,6 **** # ! # bio/db/embl/uniprot.rb - UniProt database class # ! # Copyright (C) 2005 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/uniprot.rb - UniProt database class # ! # Copyright:: Copyright (C) 2005 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for UniProtKB/SwissProt specific methods. ! # ! # UniProtKB/SwissProt specific methods are defined in this class. ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes are ! # defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("p53_human.swiss") ! # obj = Bio::UniProt.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == Referencees ! # ! # * UniProt ! # http://uniprot.org/ ! # ! # * The UniProtKB/SwissProt/TrEMBL User Manual ! # http://www.expasy.org/sprot/userman.html ! ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for SwissProt database entry. + # See also Bio::SPTR class. class UniProt < SPTR # Nothing to do (UniProt format is abstracted in SPTR) Index: swissprot.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/swissprot.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** swissprot.rb 23 Aug 2004 23:40:35 -0000 1.3 --- swissprot.rb 28 Jan 2006 06:40:38 -0000 1.4 *************** *** 1,6 **** # ! # bio/db/embl/swissprot.rb - SwissProt database class # ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/swissprot.rb - SwissProt database class # ! # Copyright:: Copyright (C) 2001, 2002 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for SwissProt specific methods. ! # ! # SwissProt (before UniProtKB/SwissProt) specific methods are defined in ! # this class. Shared methods for UniProtKB/SwissProt and TrEMBL classes ! # are defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("p53_human.swiss") ! # obj = Bio::SwissProt.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == Referencees ! # ! # * Swiss-Prot Protein knowledgebase ! # http://au.expasy.org/sprot/ ! # ! # * Swiss-Prot Protein Knowledgebase User Manual ! # http://au.expasy.org/sprot/userman.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for SwissProt database entry. + # See also Bio::SPTR class. class SwissProt < SPTR # Nothing to do (SwissProt format is abstracted in SPTR) Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** embl.rb 2 Nov 2005 07:30:14 -0000 1.25 --- embl.rb 28 Jan 2006 06:40:38 -0000 1.26 *************** *** 8,23 **** # $Id$ # ! # == EMBL database entry ! # # # ! # == Example # ! # emb = Bio::EMBL.new($<.read) ! # emb.entry_id ! # emb.each_cds do |cds| ! # cds ! # end ! # emb.seq # #-- --- 8,31 ---- # $Id$ # ! # == Description # + # Parser class for EMBL database entry. # ! # == Examples # ! # emb = Bio::EMBL.new($<.read) ! # emb.entry_id ! # emb.each_cds do |cds| ! # cds # A CDS in feature table. ! # end ! # emb.seq #=> "ACGT..." ! # ! # == References ! # ! # * The EMBL Nucleotide Sequence Database ! # http://www.ebi.ac.uk/embl/ ! # ! # * The EMBL Nucleotide Sequence Database: Users Manual ! # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html # #-- Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/common.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** common.rb 2 Nov 2005 07:30:14 -0000 1.8 --- common.rb 28 Jan 2006 06:40:38 -0000 1.9 *************** *** 7,14 **** # $Id$ # ! # == EMBL style databases class # ! # This module defines a common framework among EMBL, SWISS-PROT, TrEMBL. ! # For more details, see the documentations in each embl/*.rb libraries. # # EMBL style format: --- 7,17 ---- # $Id$ # ! # == Description # ! # EMBL style databases class ! # ! # This module defines a common framework among EMBL, UniProtKB, SWISS-PROT, ! # TrEMBL. For more details, see the documentations in each embl/*.rb ! # libraries. # # EMBL style format: *************** *** 39,45 **** # // - termination line (ends each entry; 1 per entry) # ! # ! # == Example # # require 'bio/db/embl/common' # module Bio --- 42,48 ---- # // - termination line (ends each entry; 1 per entry) # ! # == Examples # + # # Make a new parser class for EMBL style database entry. # require 'bio/db/embl/common' # module Bio *************** *** 48,51 **** --- 51,72 ---- # end # end + # + # == References + # + # * The EMBL Nucleotide Sequence Database + # http://www.ebi.ac.uk/embl/ + # + # * The EMBL Nucleotide Sequence Database: Users Manual + # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html + # + # * Swiss-Prot Protein knowledgebase. TrEMBL Computer-annotated supplement + # to Swiss-Prot + # http://au.expasy.org/sprot/ + # + # * UniProt + # http://uniprot.org/ + # + # * The UniProtKB/SwissProt/TrEMBL User Manual + # http://www.expasy.org/sprot/userman.html # #-- Index: trembl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/trembl.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** trembl.rb 23 Aug 2004 23:40:35 -0000 1.3 --- trembl.rb 28 Jan 2006 06:40:38 -0000 1.4 *************** *** 1,6 **** # ! # bio/db/embl/trembl.rb - TrEMBL database class # ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/trembl.rb - TrEMBL database class # ! # Copyright:: Copyright (C) 2001, 2002 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for TrEMBL specific methods. ! # ! # UniProtKB/SwissProt specific methods are defined in this class. ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes are ! # defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("Q2UNG2_ASPOR.trembl") ! # obj = Bio::TrEMBL.new(str) ! # obj.entry_id #=> "Q2UNG2_ASPOR" ! # ! # == Referencees ! # ! # * TrEMBL Computer-annotated supplement to Swiss-Prot ! # http://au.expasy.org/sprot/ ! # ! # * TrEMBL Computer-annotated supplement to Swiss-Prot User Manual ! # http://au.expasy.org/sprot/userman.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for TrEMBL database entry. + # See also Bio::SPTR class. class TrEMBL < SPTR # Nothing to do (TrEMBL format is abstracted in SPTR) From k at pub.open-bio.org Sat Jan 28 01:46:45 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio shell.rb,1.11,1.12 Message-ID: <200601280646.k0S6kiVL004805@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio Modified Files: shell.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: shell.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/shell.rb,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** shell.rb 7 Dec 2005 05:12:06 -0000 1.11 --- shell.rb 28 Jan 2006 06:46:42 -0000 1.12 *************** *** 43,46 **** --- 43,47 ---- require 'bio/shell/plugin/obda' require 'bio/shell/plugin/keggapi' + require 'bio/shell/plugin/emboss' extend Ghost From k at pub.open-bio.org Sat Jan 28 01:46:44 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:44 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.59,1.60 Message-ID: <200601280646.k0S6kiVL004797@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory pub.open-bio.org:/tmp/cvs-serv4775/lib Modified Files: bio.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.59 retrieving revision 1.60 diff -C2 -d -r1.59 -r1.60 *** bio.rb 20 Jan 2006 09:57:08 -0000 1.59 --- bio.rb 28 Jan 2006 06:46:42 -0000 1.60 *************** *** 29,33 **** module Bio ! BIORUBY_VERSION = [0, 7, 1].extend(Comparable) ### Basic data types --- 29,33 ---- module Bio ! BIORUBY_VERSION = [0, 7, 2].extend(Comparable) ### Basic data types *************** *** 195,199 **** #end ! # autoload :EMBOSS, 'bio/appl/emboss' # use bio/command, improve autoload :PSORT, 'bio/appl/psort' --- 195,199 ---- #end ! autoload :EMBOSS, 'bio/appl/emboss' # use bio/command, improve autoload :PSORT, 'bio/appl/psort' From k at pub.open-bio.org Sat Jan 28 01:46:44 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:44 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/appl emboss.rb,1.2,1.3 Message-ID: <200601280646.k0S6kiVL004801@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/appl In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio/appl Modified Files: emboss.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: emboss.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/appl/emboss.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** emboss.rb 8 Sep 2005 01:22:08 -0000 1.2 --- emboss.rb 28 Jan 2006 06:46:42 -0000 1.3 *************** *** 1,6 **** # ! # bio/appl/emboss.rb - EMBOSS wrapper # ! # Copyright (C) 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,16 ---- # ! # = bio/appl/emboss.rb - EMBOSS wrapper # ! # Copyright:: Copyright (C) 2002, 2005 ! # KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == References ! # ! # * http://www.emboss.org ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,68 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # module Bio ! class EMBOSS ! def initialize(cmd_line) ! @cmd_line = cmd_line + ' -stdout' ! end ! def exec ! begin ! @io = IO.popen(@cmd_line, "w+") ! @result = @io.read ! return @result ! ensure ! @io.close ! end ! end ! attr_reader :io, :result end ! end ! ! =begin ! ! = Bio::EMBOSS ! ! EMBOSS wrapper. ! #!/usr/bin/env ruby ! require 'bio' ! emboss = Bio::EMBOSS.new("getorf -sequence ~/xlrhodop -outseq stdout") ! puts emboss.exec ! --- Bio::EMBOSS.new(command_line) ! --- Bio::EMBOSS#exec ! --- Bio::EMBOSS#io ! --- Bio::EMBOSS#result ! === SEE ALSO ! * http://www.emboss.org - =end --- 28,79 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # module Bio ! autoload :Command, 'bio/command' ! class EMBOSS ! extend Bio::Command::Tools + def self.seqret(arg) + str = self.retrieve('seqret', arg) end ! def self.entret(arg) ! str = self.retrieve('entret', arg) ! end ! def initialize(cmd_line) ! @cmd_line = cmd_line + ' -stdout -auto' ! end ! def exec ! begin ! @io = IO.popen(@cmd_line, "w+") ! @result = @io.read ! return @result ! ensure ! @io.close ! end ! end ! attr_reader :io, :result ! private ! def self.retrieve(cmd, arg) ! cmd = [ cmd, arg, '-auto', '-stdout' ] ! str = '' ! call_command_local(cmd) do |inn, out| ! inn.close_write ! str = out.read ! end ! return str ! end ! end # EMBOSS ! end # Bio From k at pub.open-bio.org Sat Jan 28 01:46:45 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/shell/plugin entry.rb,1.4,1.5 Message-ID: <200601280646.k0S6kjVL004809@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/shell/plugin In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio/shell/plugin Modified Files: entry.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: entry.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/shell/plugin/entry.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** entry.rb 7 Dec 2005 05:12:07 -0000 1.4 --- entry.rb 28 Jan 2006 06:46:43 -0000 1.5 *************** *** 67,84 **** # * IO -- IO object (first entry only) # * "filename" -- local file (first entry only) ! # * "db:entry" -- local bioflat, OBDA, KEGG API def ent(arg) entry = "" db, entry_id = arg.to_s.strip.split(/:/) if arg.respond_to?(:gets) or File.exists?(arg) entry = flatfile(arg) elsif Bio::Shell.find_flat_dir(db) entry = flatsearch(db, entry_id) elsif obdadbs.include?(db) entry = obdaentry(db, entry_id) else ! entry = bget(arg) end return entry end --- 67,110 ---- # * IO -- IO object (first entry only) # * "filename" -- local file (first entry only) ! # * "db:entry" -- local BioFlat, OBDA, EMBOSS, KEGG API def ent(arg) entry = "" db, entry_id = arg.to_s.strip.split(/:/) + + # local file if arg.respond_to?(:gets) or File.exists?(arg) + puts "Retrieving entry from file (#{arg})" entry = flatfile(arg) + + # BioFlat in ./.bioruby/bioflat/ or ~/.bioinformatics/.bioruby/bioflat/ elsif Bio::Shell.find_flat_dir(db) + puts "Retrieving entry from local BioFlat database (#{arg})" entry = flatsearch(db, entry_id) + + # OBDA in ~/.bioinformatics/seqdatabase.ini elsif obdadbs.include?(db) + puts "Retrieving entry from OBDA (#{arg})" entry = obdaentry(db, entry_id) + else ! # EMBOSS USA in ~/.embossrc ! str = entret(arg) ! if $?.exitstatus == 0 and str.length != 0 ! puts "Retrieving entry from EMBOSS (#{arg})" ! entry = str ! ! # KEGG API at http://www.genome.jp/kegg/soap/ ! else ! puts "Retrieving entry from KEGG API (#{arg})" ! entry = bget(arg) ! end end + return entry + end + + def obj(arg) + str = ent(arg) + flatparse(str) end From k at pub.open-bio.org Sat Jan 28 02:22:16 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 07:22:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/shell/plugin emboss.rb,NONE,1.1 Message-ID: <200601280722.k0S7MGVL005006@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/shell/plugin In directory pub.open-bio.org:/tmp/cvs-serv5002/lib/bio/shell/plugin Added Files: emboss.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) --- NEW FILE: emboss.rb --- # # = bio/shell/plugin/emboss.rb - methods to use EMBOSS # # Copyright:: Copyright (C) 2005 # Toshiaki Katayama # License:: LGPL # # $Id: emboss.rb,v 1.1 2006/01/28 07:22:14 k Exp $ # #-- # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # #++ # module Bio::Shell private def seqret(usa) Bio::EMBOSS.seqret(usa) end def entret(usa) Bio::EMBOSS.entret(usa) end end From nakao at pub.open-bio.org Sat Jan 28 02:42:01 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 07:42:01 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io fastacmd.rb,1.8,1.9 Message-ID: <200601280742.k0S7g1VL005071@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5061/lib/bio/io Modified Files: fastacmd.rb Log Message: * Added RDoc. Index: fastacmd.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/fastacmd.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** fastacmd.rb 26 Sep 2005 13:00:08 -0000 1.8 --- fastacmd.rb 28 Jan 2006 07:41:59 -0000 1.9 *************** *** 1,7 **** # ! # bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright (C) 2005 Shuji SHIGENOBU ! # Copyright (C) 2005 Toshiaki Katayama # # This library is free software; you can redistribute it and/or --- 1,42 ---- # ! # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005 ! # Shuji SHIGENOBU , ! # Toshiaki Katayama ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Retrives FASTA formatted sequences from a blast database using ! # NCBI fastacmd command. ! # ! # == Examples ! # ! # database = ARGV.shift || "/db/myblastdb" ! # entry_id = ARGV.shift || "sp:128U_DROME" ! # ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! # ! # fastacmd = Bio::Blast::Fastacmd.new(database) ! # ! # entry = fastacmd.get_by_id(entry_id) ! # fastacmd.fetch(entry_id) ! # fastacmd.fetch(ent_list) ! # ! # fastacmd.fetch(ent_list).each do |fasta| ! # puts fasta ! # end ! # ! # == References ! # ! # * NCBI tool ! # ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/ncbi.tar.gz ! # ! # * fastacmd.html ! # http://biowulf.nih.gov/apps/blast/doc/fastacmd.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 19,23 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 54,58 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 29,32 **** --- 64,69 ---- class Blast + # NCBI fastacmd wrapper class + # class Fastacmd *************** *** 34,49 **** include Bio::Command::Tools ! def initialize(db) ! @database = db @fastacmd = 'fastacmd' end - attr_accessor :database, :fastacmd, :errorlog ! # get an entry_id and returns a Bio::FastaFormat object def get_by_id(entry_id) fetch(entry_id).shift end ! # get one or more entry_id and returns an Array of Bio::FastaFormat objects def fetch(list) if list.respond_to?(:join) --- 71,113 ---- include Bio::Command::Tools ! # Database file path. ! attr_accessor :database ! ! # fastcmd command file path. ! attr_accessor :fastacmd ! ! # ! attr_accessor :errorlog ! ! # Initalize a fastacmd object. ! # ! # fastacmd = Bio::Blast::Fastacmd.new("/db/myblastdb") ! def initialize(blast_database_file_path) ! @database = blast_database_file_path @fastacmd = 'fastacmd' end ! ! # get an entry_id and returns a Bio::FastaFormat object. ! # ! # entry_id = "sp:128U_DROME" ! # entry = fastacmd.get_by_id(entry_id) def get_by_id(entry_id) fetch(entry_id).shift end ! # get one or more entry_id and returns an Array of Bio::FastaFormat objects. ! # ! # Fastacmd#fetch(entry_id) returns an Array of a Bio::FastaFormat ! # object even when the result is a single entry. ! # ! # p fastacmd.fetch(entry_id) ! # ! # Fastacmd#fetch method also accepts a list of entry_id and returns ! # an Array of Bio::FastaFormat objects. ! # ! # ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! # p fastacmd.fetch(ent_list) ! # def fetch(list) if list.respond_to?(:join) *************** *** 60,63 **** --- 124,134 ---- end + # Iterates each entry. + # + # You can also iterate on all sequences in the database! + # fastacmd.each do |fasta| + # p [ fasta.definition[0..30], fasta.seq.size ] + # end + # def each_entry cmd = [ @fastacmd, '-d', @database, '-D', 'T' ] *************** *** 65,70 **** inn.close_write Bio::FlatFile.open(Bio::FastaFormat, out) do |f| ! f.each_entry do |e| ! yield e end end --- 136,141 ---- inn.close_write Bio::FlatFile.open(Bio::FastaFormat, out) do |f| ! f.each_entry do |entry| ! yield entry end end *************** *** 74,123 **** alias each each_entry ! end ! ! end ! end ! ! ! if __FILE__ == $0 ! ! database = ARGV.shift || "/db/myblastdb" ! entry_id = ARGV.shift || "sp:128U_DROME" ! ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! ! fastacmd = Bio::Blast::Fastacmd.new(database) ! ! ### Retrieve one sequence ! entry = fastacmd.get_by_id(entry_id) ! ! # Fastacmd#get_by_id(entry_id) returns a Bio::FastaFormat object. ! p entry ! ! # Bio::FastaFormat becomes a fasta format string when printed by puts. ! puts entry ! ! # Fastacmd#fetch(entry_id) returns an Array of a Bio::FastaFormat ! # object even when the result is a single entry. ! p fastacmd.fetch(entry_id) ! ! ### Retrieve more sequences ! ! # Fastacmd#fetch method also accepts a list of entry_id and returns ! # an Array of Bio::FastaFormat objects. ! p fastacmd.fetch(ent_list) ! ! # So, you can iterate on the results. ! fastacmd.fetch(ent_list).each do |fasta| ! puts fasta ! end ! ! ! ### Iterates on all entries ! # You can also iterate on all sequences in the database! ! fastacmd.each do |fasta| ! p [ fasta.definition[0..30], fasta.seq.size ] ! end - end --- 145,152 ---- alias each each_entry ! end # class Fastacmd ! end # class Blast ! end # module Bio From nakao at pub.open-bio.org Sat Jan 28 03:05:55 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 08:05:55 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/io test_fastacmd.rb,NONE,1.1 Message-ID: <200601280805.k0S85tVL005150@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5138/test/unit/bio/io Added Files: test_fastacmd.rb Log Message: * Newly added. --- NEW FILE: test_fastacmd.rb --- # # test/unit/bio/io/test_fastacmd.rb - Unit test for Bio::Blast::Fastacmd. # # Copyright (C) 2006 Mitsuteru Nakao # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: test_fastacmd.rb,v 1.1 2006/01/28 08:05:52 nakao Exp $ # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 4, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio/io/fastacmd' module Bio class TestFastacmd < Test::Unit::TestCase def setup @obj = Bio::Blast::Fastacmd.new("/tmp/test") end def test_database assert_equal("/tmp/test", @obj.database) end def test_fastacmd assert_equal("fastacmd", @obj.fastacmd) end def test_methods method_list = ['get_by_id', 'fetch', 'each_entry', 'each'] method_list.each do |method| assert(@obj.methods.include?(method)) end end end end From nakao at pub.open-bio.org Sat Jan 28 03:12:23 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 08:12:23 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io fastacmd.rb,1.9,1.10 Message-ID: <200601280812.k0S8CNVL005196@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5186/lib/bio/io Modified Files: fastacmd.rb Log Message: * Updated RDoc. Index: fastacmd.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/fastacmd.rb,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** fastacmd.rb 28 Jan 2006 07:41:59 -0000 1.9 --- fastacmd.rb 28 Jan 2006 08:12:21 -0000 1.10 *************** *** 2,8 **** # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005 # Shuji SHIGENOBU , ! # Toshiaki Katayama # Lisence:: LGPL # --- 2,9 ---- # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005, 2006 # Shuji SHIGENOBU , ! # Toshiaki Katayama , ! # Mitsuteru C. Nakao # Lisence:: LGPL # *************** *** 14,17 **** --- 15,21 ---- # NCBI fastacmd command. # + # This class requires 'fastacmd' command and a blast database + # (formatted using the '-o' option of 'formatdb'). + # # == Examples # From k at pub.open-bio.org Sat Jan 28 03:34:27 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 08:34:27 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.60,1.61 Message-ID: <200601280834.k0S8YRVL005291@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory pub.open-bio.org:/tmp/cvs-serv5285/lib Modified Files: bio.rb Log Message: * Bio::BRDB is now removed Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** bio.rb 28 Jan 2006 06:46:42 -0000 1.60 --- bio.rb 28 Jan 2006 08:34:25 -0000 1.61 *************** *** 168,172 **** # autoload :ESOAP, 'bio/io/esoap' # NCBI::ESOAP ? - # autoload :BRDB, 'bio/io/brdb' # remove --- 168,171 ---- From k at pub.open-bio.org Sat Jan 28 03:34:27 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 08:34:27 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io brdb.rb,1.4,NONE Message-ID: <200601280834.k0S8YRVL005295@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5285/lib/bio/io Removed Files: brdb.rb Log Message: * Bio::BRDB is now removed --- brdb.rb DELETED --- From nakao at pub.open-bio.org Sat Jan 28 05:49:01 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 10:49:01 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db fasta.rb,1.21,1.22 Message-ID: <200601281049.k0SAn1VL005893@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db In directory pub.open-bio.org:/tmp/cvs-serv5883/lib/bio/db Modified Files: fasta.rb Log Message: * Added RDoc. Index: fasta.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/fasta.rb,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** fasta.rb 26 Sep 2005 13:00:06 -0000 1.21 --- fasta.rb 28 Jan 2006 10:48:59 -0000 1.22 *************** *** 1,7 **** # ! # bio/db/fasta.rb - FASTA format class # ! # Copyright (C) 2001 GOTO Naohisa ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,67 ---- # ! # = bio/db/fasta.rb - FASTA format class # ! # Copyright:: Copyright (C) 2001, 2002 ! # GOTO Naohisa , ! # KATAYAMA Toshiaki ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # FASTA format class. ! # ! # == Examples ! # ! # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') ! # rub.entry_id ==> 'gi|671595' ! # rub.get('emb') ==> 'CAA85678.1' ! # rub.emb ==> 'CAA85678.1' ! # rub.gi ==> '671595' ! # rub.accession ==> 'CAA85678' ! # rub.accessions ==> [ 'CAA85678' ] ! # rub.acc_version ==> 'CAA85678.1' ! # rub.locus ==> nil ! # rub.list_ids ==> [["gi", "671595"], ! # ["emb", "CAA85678.1", nil], ! # ["Perovskia abrotanoides"]] ! # ! # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") ! # ckr.entry_id ==> "gi|2495000" ! # ckr.sp ==> "CCKR_CAVPO" ! # ckr.pir ==> "I51898" ! # ckr.gb ==> "AAB29504.1" ! # ckr.gi ==> "2495000" ! # ckr.accession ==> "AAB29504" ! # ckr.accessions ==> ["Q63931", "AAB29504"] ! # ckr.acc_version ==> "AAB29504.1" ! # ckr.locus ==> nil ! # ckr.description ==> ! # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" ! # ckr.descriptions ==> ! # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", ! # "cholecystokinin A receptor - guinea pig", ! # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] ! # ckr.words ==> ! # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", ! # "receptor", "type"] ! # ckr.id_strings ==> ! # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", ! # "544724", "AAB29504.1", "Cavia"] ! # ckr.list_ids ==> ! # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], ! # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], ! # ["gb", "AAB29504.1", nil], ["Cavia"]] ! # ! # == References ! # ! # * FASTA format (WikiPedia) ! # http://en.wikipedia.org/wiki/FASTA_format ! # ! # * Fasta format description (NCBI) ! # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 19,23 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 79,83 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 27,34 **** --- 87,171 ---- module Bio + + # Treats a FASTA formatted entry, such as: + # + # >id and/or some comments <== comment line + # ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines + # ATGCATGCATGCATGCATGCATGCATGCATGCATGC + # ATGCATGCATGC + # + # The precedent '>' can be omitted and the trailing '>' will be removed + # automatically. + # + # === Examples + # + # f_str = <sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] + # MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG + # VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME + # GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL + # KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC + # IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP + # QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES + # >sce:YBR274W CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST] + # MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP + # TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG + # GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL + # DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA + # DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI + # EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL + # AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF + # QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND + # CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN + # FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER + # KTGDPLEWRRLFKKISTICRDIILIPN + # END + # + # f = Bio::FastaFormat.new(f_str) + # puts "### FastaFormat" + # puts "# entry" + # puts f.entry + # puts "# entry_id" + # p f.entry_id + # puts "# definition" + # p f.definition + # puts "# data" + # p f.data + # puts "# seq" + # p f.seq + # puts "# seq.type" + # p f.seq.type + # puts "# length" + # p f.length + # puts "# aaseq" + # p f.aaseq + # puts "# aaseq.type" + # p f.aaseq.type + # puts "# aaseq.composition" + # p f.aaseq.composition + # puts "# aalen" + # p f.aalen + # + # === References + # + # * FASTA format (WikiPedia) + # http://en.wikipedia.org/wiki/FASTA_format + # class FastaFormat < DB + # Entry delimiter in flatfile text. DELIMITER = RS = "\n>" + # The comment line of the FASTA formatted data. + attr_accessor :definition + + # The seuqnce lines in text. + attr_accessor :data + + attr_reader :entry_overrun + + # Stores the comment and sequence information from one entry of the + # FASTA format string. If the argument contains more than one + # entry, only the first entry is used. def initialize(str) @definition = str[/.*/].sub(/^>/, '').strip # 1st line *************** *** 37,43 **** @entry_overrun = $& end - attr_accessor :definition, :data - attr_reader :entry_overrun def entry @entry = ">#{@definition}\n#{@data.strip}\n" --- 174,179 ---- @entry_overrun = $& end + # Returns the stored one entry as a FASTA format. (same as to_s) def entry @entry = ">#{@definition}\n#{@data.strip}\n" *************** *** 45,48 **** --- 181,202 ---- alias to_s entry + + # Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast + # factory object. + # + # #!/usr/bin/env ruby + # require 'bio' + # + # factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') + # flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') + # flatfile.each do |entry| + # p entry.definition + # result = entry.fasta(factory) + # result.each do |hit| + # print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " + # p hit.lap_at + # end + # end + # def query(factory) factory.query(@entry) *************** *** 51,54 **** --- 205,209 ---- alias blast query + # Returns a joined sequence line as a String. def seq unless defined?(@seq) *************** *** 76,79 **** --- 231,235 ---- end + # Returns comments. def comment seq *************** *** 81,104 **** --- 237,269 ---- end + # Returns sequence length. def length seq.length end + # Returens the Bio::Sequence::NA. def naseq Sequence::NA.new(seq) end + # Returens the length of Bio::Sequence::NA. def nalen self.naseq.length end + # Returens the Bio::Sequence::AA. def aaseq Sequence::AA.new(seq) end + # Returens the length of Bio::Sequence::AA. def aalen self.aaseq.length end + # Parsing FASTA Defline, and extract IDs. + # IDs are NSIDs (NCBI standard FASTA sequence identifiers) + # or ":"-separated IDs. + # It returns a Bio::FastaDefline instance. def identifiers unless defined?(@ids) then *************** *** 108,131 **** --- 273,310 ---- end + # Parsing FASTA Defline (using #identifiers method), and + # shows a possibly unique identifier. + # It returns a string. def entry_id identifiers.entry_id end + # Parsing FASTA Defline (using #identifiers method), and + # shows GI/locus/accession/accession with version number. + # If a entry has more than two of such IDs, + # only the first ID are shown. + # It returns a string or nil. def gi identifiers.gi end + # Returns an accession number. def accession identifiers.accession end + # Parsing FASTA Defline (using #identifiers method), and + # shows accession numbers. + # It returns an array of strings. def accessions identifiers.accessions end + # Returns accession number with version. def acc_version identifiers.acc_version end + # Returns locus. def locus identifiers.locus *************** *** 134,139 **** --- 313,339 ---- end #class FastaFormat + # Treats a FASTA formatted numerical entry, such as: + # + # >id and/or some comments <== comment line + # 24 15 23 29 20 13 20 21 21 23 22 25 13 <== numerical data + # 22 17 15 25 27 32 26 32 29 29 25 + # + # The precedent '>' can be omitted and the trailing '>' will be removed + # automatically. + # + # --- Bio::FastaNumericFormat.new(entry) + # + # Stores the comment and the list of the numerical data. + # + # --- Bio::FastaNumericFormat#definition + # + # The comment line of the FASTA formatted data. + # + # * FASTA format (Wikipedia) + # http://en.wikipedia.org/wiki/FASTA_format class FastaNumericFormat < FastaFormat + # Returns the list of the numerical data (typically the quality score + # of its corresponding sequence) as an Array. def data unless @list *************** *** 143,150 **** --- 343,352 ---- end + # Returns the number of elements in the numerical data. def length data.length end + # Yields on each elements of the numerical data. def each data.each do |x| *************** *** 153,156 **** --- 355,359 ---- end + # Returns the n-th element. def [](n) data[n] *************** *** 161,169 **** end #class FastaNumericFormat - class FastaDefline ! # specs are described in: ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers NSIDs = { --- 364,430 ---- end #class FastaNumericFormat ! # Parsing FASTA Defline, and extract IDs and other informations. ! # IDs are NSIDs (NCBI standard FASTA sequence identifiers) ! # or ":"-separated IDs. ! # ! # specs are described in: ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers ! # ! # === Examples ! # ! # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') ! # rub.entry_id ==> 'gi|671595' ! # rub.get('emb') ==> 'CAA85678.1' ! # rub.emb ==> 'CAA85678.1' ! # rub.gi ==> '671595' ! # rub.accession ==> 'CAA85678' ! # rub.accessions ==> [ 'CAA85678' ] ! # rub.acc_version ==> 'CAA85678.1' ! # rub.locus ==> nil ! # rub.list_ids ==> [["gi", "671595"], ! # ["emb", "CAA85678.1", nil], ! # ["Perovskia abrotanoides"]] ! # ! # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") ! # ckr.entry_id ==> "gi|2495000" ! # ckr.sp ==> "CCKR_CAVPO" ! # ckr.pir ==> "I51898" ! # ckr.gb ==> "AAB29504.1" ! # ckr.gi ==> "2495000" ! # ckr.accession ==> "AAB29504" ! # ckr.accessions ==> ["Q63931", "AAB29504"] ! # ckr.acc_version ==> "AAB29504.1" ! # ckr.locus ==> nil ! # ckr.description ==> ! # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" ! # ckr.descriptions ==> ! # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", ! # "cholecystokinin A receptor - guinea pig", ! # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] ! # ckr.words ==> ! # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", ! # "receptor", "type"] ! # ckr.id_strings ==> ! # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", ! # "544724", "AAB29504.1", "Cavia"] ! # ckr.list_ids ==> ! # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], ! # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], ! # ["gb", "AAB29504.1", nil], ["Cavia"]] ! # ! # === Refereneces ! # ! # * Fasta format description (NCBI) ! # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml ! # ! # * Frequently Asked Questions: Indexing of Sequence Identifiers (by Warren R. Gish.) ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers ! # ! # * README.formatdb ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # ! class FastaDefline NSIDs = { *************** *** 198,201 **** --- 459,471 ---- } + # Shows array that contains IDs (or ID-like strings). + # Returns an array of arrays of strings. + attr_reader :list_ids + + # Shows a possibly unique identifier. + # Returns a string. + attr_reader :entry_id + + # Parses given string. def initialize(str) @deflines = [] *************** *** 211,217 **** end #def initialize ! attr_reader :list_ids ! attr_reader :entry_id ! def add_defline(str) case str --- 481,485 ---- end #def initialize ! # Parses given string and adds parsed data. def add_defline(str) case str *************** *** 344,347 **** --- 612,619 ---- private :parse_NSIDs + + # Shows original string. + # Note that the result of this method may be different from + # original string which is given in FastaDefline.new method. def to_s @deflines.collect { |a| *************** *** 351,358 **** --- 623,632 ---- end + # Shows description. def description @deflines[0].to_a[-1] end + # Returns descriptions. def descriptions @deflines.collect do |a| *************** *** 361,364 **** --- 635,640 ---- end + # Shows ID-like strings. + # Returns an array of strings. def id_strings r = [] *************** *** 402,405 **** --- 678,682 ---- ] + # Shows words used in the defline. Returns an Array. def words(case_sensitive = nil, kill_regexp = self.class::KillRegexpArray, kwhash = self.class::KillWordsHash) *************** *** 427,432 **** end ! def get(db) ! db =db.to_s r = nil unless r = @info[db] then --- 704,710 ---- end ! # Returns identifires by a database name. ! def get(dbname) ! db = dbname.to_s r = nil unless r = @info[db] then *************** *** 450,457 **** end ! def get_by_type(tstr) @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! if i = labels.index(tstr) then return x[i+1] end --- 728,736 ---- end ! # Returns an identifier by given type. ! def get_by_type(type_str) @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! if i = labels.index(type_str) then return x[i+1] end *************** *** 461,469 **** end ! def get_all_by_type(*tstrarg) d = [] @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! tstrarg.each do |y| if i = labels.index(y) then d << x[i+1] if x[i+1] --- 740,749 ---- end ! # Returns identifiers by given type. ! def get_all_by_type(*type_strarg) d = [] @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! type_strarg.each do |y| if i = labels.index(y) then d << x[i+1] if x[i+1] *************** *** 475,478 **** --- 755,762 ---- end + # Shows locus. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def locus unless defined?(@locus) *************** *** 482,485 **** --- 766,773 ---- end + # Shows GI. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def gi unless defined?(@gi) then *************** *** 489,492 **** --- 777,784 ---- end + # Shows accession with version number. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def acc_version unless defined?(@acc_version) then *************** *** 496,499 **** --- 788,793 ---- end + # Shows accession numbers. + # Returns an array of strings. def accessions unless defined?(@accessions) then *************** *** 504,507 **** --- 798,802 ---- end + # Shows an accession number. def accession unless defined?(@accession) then *************** *** 524,527 **** --- 819,823 ---- r end + end #class FastaDefline *************** *** 610,869 **** end - - =begin - - = Bio::FastaFormat - - Treats a FASTA formatted entry, such as: - - >id and/or some comments <== comment line - ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines - ATGCATGCATGCATGCATGCATGCATGCATGCATGC - ATGCATGCATGC - - The precedent '>' can be omitted and the trailing '>' will be removed - automatically. - - --- Bio::FastaFormat.new(entry) - - Stores the comment and sequence information from one entry of the - FASTA format string. If the argument contains more than one - entry, only the first entry is used. - - --- Bio::FastaFormat#entry - - Returns the stored one entry as a FASTA format. (same as to_s) - - --- Bio::FastaFormat#definition - - Returns the comment line of the FASTA formatted data. - - --- Bio::FastaFormat#seq - - Returns a joined sequence line as a String. - - --- Bio::FastaFormat#query(factory) - --- Bio::FastaFormat#fasta(factory) - --- Bio::FastaFormat#blast(factory) - - Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast - factory object. - - #!/usr/bin/env ruby - - require 'bio' - - factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') - flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') - flatfile.each do |entry| - p entry.definition - result = entry.fasta(factory) - result.each do |hit| - print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " - p hit.lap_at - end - end - - --- Bio::FastaFormat#length - - Returns sequence length. - - --- Bio::FastaFormat#naseq - --- Bio::FastaFormat#nalen - --- Bio::FastaFormat#aaseq - --- Bio::FastaFormat#aalen - - If you know whether the sequence is NA or AA, use these methods. - 'naseq' and 'aaseq' methods returen the Bio::Sequence::NA or - Bio::Sequence::AA object respectively. 'nalen' and 'aalen' methods - return the length of them. - - --- Bio::FastaFormat#identifiers - - Parsing FASTA Defline, and extract IDs. - IDs are NSIDs (NCBI standard FASTA sequence identifiers) - or ":"-separated IDs. - It returns a Bio::FastaDefline instance. - - --- Bio::FastaFormat#entry_id - - Parsing FASTA Defline (using #identifiers method), and - shows a possibly unique identifier. - It returns a string. - - --- Bio::FastaFormat#gi - --- Bio::FastaFormat#locus - --- Bio::FastaFormat#accession - --- Bio::FastaFormat#acc_version - - Parsing FASTA Defline (using #identifiers method), and - shows GI/locus/accession/accession with version number. - If a entry has more than two of such IDs, - only the first ID are shown. - It returns a string or nil. - - --- Bio::FastaFormat#accessions - - Parsing FASTA Defline (using #identifiers method), and - shows accession numbers. - It returns an array of strings. - - --- Bio::FastaFormat - - = Bio::FastaNumericFormat - - Treats a FASTA formatted numerical entry, such as: - - >id and/or some comments <== comment line - 24 15 23 29 20 13 20 21 21 23 22 25 13 <== numerical data - 22 17 15 25 27 32 26 32 29 29 25 - - The precedent '>' can be omitted and the trailing '>' will be removed - automatically. - - --- Bio::FastaNumericFormat.new(entry) - - Stores the comment and the list of the numerical data. - - --- Bio::FastaNumericFormat#definition - - The comment line of the FASTA formatted data. - - --- Bio::FastaNumericFormat#data - - Returns the list of the numerical data (typically the quality score - of its corresponding sequence) as an Array. - - --- Bio::FastaNumericFormat#length - - Returns the number of elements in the numerical data. - - --- Bio::FastaNumericFormat#each - - Yields on each elements of the numerical data. - - --- Bio::FastaNumericFormat#[](n) - - Returns the n-th element. - - --- Bio::FastaNumericFormat#identifiers - --- Bio::FastaNumericFormat#entry_id - --- Bio::FastaNumericFormat#gi - --- Bio::FastaNumericFormat#locus - --- Bio::FastaNumericFormat#accession - --- Bio::FastaNumericFormat#acc_version - --- Bio::FastaNumericFormat#accessions - - Same as Bio::FastaFormat. - - - = Bio::FastaDefline - - Parsing FASTA Defline, and extract IDs and other informations. - IDs are NSIDs (NCBI standard FASTA sequence identifiers) - or ":"-separated IDs. - - --- see also: - ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb - http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers - - --- Bio::FastaDefline.new(str) - - Parses given string. - - --- Bio::FastaFormat#entry_id - - Shows a possibly unique identifier. - Returns a string. - - --- Bio::FastaDefline#gi - --- Bio::FastaDefline#locus - --- Bio::FastaDefline#accession - --- Bio::FastaDefline#acc_version - - Shows GI/locus/accession/accession with version number. - If the entry has more than two of such IDs, - only the first ID are shown. - Returns a string or nil. - - --- Bio::FastaFormat#accessions - - Shows accession numbers. - Returns an array of strings. - - --- Bio::FastaDefline#add_defline(str) - - Parses given string and adds parsed data. - - --- Bio::FastaDefline#to_s - - Shows original string. - Note that the result of this method may be different from - original string which is given in FastaDefline.new method. - - --- Bio::FastaDefline#id_strings - - Shows ID-like strings. - Returns an array of strings. - - --- Bio::FastaDefline#list_ids - - Shows array that contains IDs (or ID-like strings). - Returns an array of arrays of strings. - - --- Bio::FastaDefline#description - --- Bio::FastaDefline#descriptions - - --- Bio::FastaDefline#words(case_sensitive = nil, - kill_words_regexp_array, kill_words_hash) - - --- Bio::FastaDefline#get(tag_of_id) - - --- Bio::FastaDefline#get_by_type(type_of_id) - - --- Bio::FastaDefline#get_all_by_type(type_of_id) - - --- examples: - rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') - rub.entry_id ==> 'gi|671595' - rub.get('emb') ==> 'CAA85678.1' - rub.emb ==> 'CAA85678.1' - rub.gi ==> '671595' - rub.accession ==> 'CAA85678' - rub.accessions ==> [ 'CAA85678' ] - rub.acc_version ==> 'CAA85678.1' - rub.locus ==> nil - rub.list_ids ==> [["gi", "671595"], - ["emb", "CAA85678.1", nil], - ["Perovskia abrotanoides"]] - - ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") - ckr.entry_id ==> "gi|2495000" - ckr.sp ==> "CCKR_CAVPO" - ckr.pir ==> "I51898" - ckr.gb ==> "AAB29504.1" - ckr.gi ==> "2495000" - ckr.accession ==> "AAB29504" - ckr.accessions ==> ["Q63931", "AAB29504"] - ckr.acc_version ==> "AAB29504.1" - ckr.locus ==> nil - ckr.description ==> - "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" - ckr.descriptions ==> - ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", - "cholecystokinin A receptor - guinea pig", - "cholecystokinin A receptor; CCK-A receptor [Cavia]"] - ckr.words ==> - ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", - "receptor", "type"] - ckr.id_strings ==> - ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", - "544724", "AAB29504.1", "Cavia"] - ckr.list_ids ==> - [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], - ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], - ["gb", "AAB29504.1", nil], ["Cavia"]] - - =end - --- 906,908 ---- From ngoto at pub.open-bio.org Sun Jan 29 01:48:41 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:48:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb,1.13,1.14 Message-ID: <200601290648.k0T6mfVL007883@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv7873 Modified Files: pdb.rb Log Message: changed "str" to "str.to_s" to improve tolerance to wrong or incomplete data Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** pdb.rb 20 Jan 2006 13:54:08 -0000 1.13 --- pdb.rb 29 Jan 2006 06:48:39 -0000 1.14 *************** *** 87,91 **** module Pdb_SList def self.new(str) ! str.strip.split(/\;\s*/) end end --- 87,91 ---- module Pdb_SList def self.new(str) ! str.to_s.strip.split(/\;\s*/) end end *************** *** 93,97 **** module Pdb_List def self.new(str) ! str.strip.split(/\,\s*/) end end --- 93,97 ---- module Pdb_List def self.new(str) ! str.to_s.strip.split(/\,\s*/) end end *************** *** 99,103 **** module Pdb_Specification_list def self.new(str) ! a = str.strip.split(/\;\s*/) a.collect! { |x| x.split(/\:\s*/, 2) } a --- 99,103 ---- module Pdb_Specification_list def self.new(str) ! a = str.to_s.strip.split(/\;\s*/) a.collect! { |x| x.split(/\:\s*/, 2) } a *************** *** 107,111 **** module Pdb_String def self.new(str) ! str.gsub(/\s+\z/, '') end --- 107,111 ---- module Pdb_String def self.new(str) ! str.to_s.gsub(/\s+\z/, '') end *************** *** 117,121 **** @@nn = nn def self.new(str) ! str.gsub(/\s+\z/, '').ljust(@@nn)[0, @@nn] end } --- 117,121 ---- @@nn = nn def self.new(str) ! str.to_s.gsub(/\s+\z/, '').ljust(@@nn)[0, @@nn] end } *************** *** 130,134 **** @@nn = nn def self.new(str) ! str.ljust(@@nn)[0, @@nn] end } --- 130,134 ---- @@nn = nn def self.new(str) ! str.to_s.ljust(@@nn)[0, @@nn] end } *************** *** 158,162 **** module Pdb_StringRJ def self.new(str) ! str.gsub(/\A\s+/, '') end end --- 158,162 ---- module Pdb_StringRJ def self.new(str) ! str.to_s.gsub(/\A\s+/, '') end end From ngoto at pub.open-bio.org Sun Jan 29 01:54:15 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:15 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db pdb.rb,1.5,1.6 Message-ID: <200601290654.k0T6sFVL007957@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db In directory pub.open-bio.org:/tmp/cvs-serv7939/db Modified Files: pdb.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** pdb.rb 16 Dec 2005 19:23:03 -0000 1.5 --- pdb.rb 29 Jan 2006 06:54:13 -0000 1.6 *************** *** 26,29 **** --- 26,32 ---- module Bio class PDB #< DB + + autoload :ChemicalComponent, 'bio/db/pdb/chemicalcomponent' + end #class PDB end #module Bio From ngoto at pub.open-bio.org Sun Jan 29 01:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io flatfile.rb,1.42,1.43 Message-ID: <200601290654.k0T6sGVL007965@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv7939/io Modified Files: flatfile.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: flatfile.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile.rb,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** flatfile.rb 28 Jan 2006 04:23:41 -0000 1.42 --- flatfile.rb 29 Jan 2006 06:54:14 -0000 1.43 *************** *** 443,446 **** --- 443,449 ---- Bio::PDB + when /^RESIDUE +.+ +\d+\s*$/ + Bio::PDB::ChemicalComponent + when /^CLUSTAL .*\(.*\).*sequence +alignment/ Bio::ClustalW::Report From ngoto at pub.open-bio.org Sun Jan 29 01:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb chemicalcomponent.rb, NONE, 1.1 Message-ID: <200601290654.k0T6sGVL007961@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv7939/db/pdb Added Files: chemicalcomponent.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. --- NEW FILE: chemicalcomponent.rb --- # # = bio/db/pdb/chemicalcomponent.rb - PDB Chemical Component Dictionary parser # # Copyright:: Copyright (C) 2006 # GOTO Naohisa # License:: LGPL # # $Id: chemicalcomponent.rb,v 1.1 2006/01/29 06:54:13 ngoto Exp $ # #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #++ # # = About Bio::PDB::ChemicalComponent # # Please refer Bio::PDB::ChemicalComponent. # # = References # # * (()) # * http://deposit.pdb.org/het_dictionary.txt # require 'bio/db/pdb/pdb' module Bio class PDB # Bio::PDB::ChemicalComponet is a parser for a entry of # the PDB Chemical Component Dictionary. # # The PDB Chemical Component Dictionary is available in # http://deposit.pdb.org/het_dictionary.txt class ChemicalComponent # delimiter for reading via Bio::FlatFile DELIMITER = RS = "\n\n" # Single field (normally single line) of a entry class Record < Bio::PDB::Record # fetches record name def fetch_record_name(str) str[0..6].strip end private :fetch_record_name # fetches record name def self.fetch_record_name(str) str[0..6].strip end private_class_method :fetch_record_name # RESIDUE field. # It would be wrong because the definition described in documents # seems ambiguous. RESIDUE = def_rec([ 11, 13, Pdb_LString[3], :hetID ], [ 16, 20, Pdb_Integer, :numHetAtoms ] ) # CONECT field # It would be wrong because the definition described in documents # seems ambiguous. CONECT = def_rec([ 12, 15, Pdb_Atom, :name ], [ 19, 20, Pdb_Integer, :num ], [ 21, 24, Pdb_Atom, :other_atoms ], [ 26, 29, Pdb_Atom, :other_atoms ], [ 31, 34, Pdb_Atom, :other_atoms ], [ 36, 39, Pdb_Atom, :other_atoms ], [ 41, 44, Pdb_Atom, :other_atoms ], [ 46, 49, Pdb_Atom, :other_atoms ], [ 51, 54, Pdb_Atom, :other_atoms ], [ 56, 59, Pdb_Atom, :other_atoms ], [ 61, 64, Pdb_Atom, :other_atoms ], [ 66, 69, Pdb_Atom, :other_atoms ], [ 71, 74, Pdb_Atom, :other_atoms ], [ 76, 79, Pdb_Atom, :other_atoms ] ) # HET field. # It is the same as Bio::PDB::Record::HET. HET = Bio::PDB::Record::HET #-- #HETSYN = Bio::PDB::Record::HETSYN #++ # HETSYN field. # It is very similar to Bio::PDB::Record::HETSYN. HETSYN = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 14, Pdb_LString(3), :hetID ], [ 16, 70, Pdb_String, :hetSynonyms ] ) # HETNAM field. # It is the same as Bio::PDB::Record::HETNAM. HETNAM = Bio::PDB::Record::HETNAM # FORMUL field. # It is the same as Bio::PDB::Record::FORMUL. FORMUL = Bio::PDB::Record::FORMUL # default definition for unknown fields. Default = Bio::PDB::Record::Default # Hash to store allowed definitions. Definition = create_definition_hash # END record class. # # Because END is a reserved word of Ruby, it is separately # added to the hash End = Bio::PDB::Record::End Definition['END'] = End # Look up the class in Definition hash def self.get_record_class(str) t = fetch_record_name(str) return Definition[t] end end #class Record # Creates a new object. def initialize(str) @data = str.split(/[\r\n]+/) @hash = {} #Flag to say whether the current line is part of a continuation cont = false #Goes through each line and replace that line with a PDB::Record @data.collect! do |line| #Go to next if the previous line was contiunation able, and #add_continuation returns true. Line is added by add_continuation next if cont and cont = cont.add_continuation(line) #Make the new record f = Record.get_record_class(line).new.initialize_from_string(line) #p f #Set cont cont = f if f.continue? #Set the hash to point to this record either by adding to an #array, or on it's own key = f.record_name if a = @hash[key] then a << f else @hash[key] = [ f ] end f end #each #At the end we need to add the final model @data.compact! end # all records in this entry as an array. attr_reader :data # all records in this entry as an hash accessed by record names. attr_reader :hash # Identifier written in the first line "RESIDUE" record. (e.g. CMP) def entry_id @data[0].hetID end # Synonyms for the comical component. Returns an array of strings. def hetsyn unless defined? @hetsyn if r = @hash["HETSYN"] @hetsyn = r[0].hetSynonyms.to_s.split(/\;\s*/) else return [] end end @hetsyn end # The name of the chemical component. # Returns a string (or nil, if the entry is something wrong). def hetnam @hash["HETNAM"][0].text end # The chemical formula of the chemical component. # Returns a string (or nil, if the entry is something wrong). def formul @hash["FORMUL"][0].text end # Returns an hash of bindings of atoms. # Note that each white spaces are stripped for atom symbols. def conect unless defined? @conect c = {} @hash["CONECT"].each do |e| key = e.name.to_s.strip unless key.empty? val = e.other_atoms.collect { |x| x.strip } #warn "Warning: #{key}: atom name conflict?" if c[key] c[key] = val end end @conect = c end @conect end # Gets all records whose record type is _name_. # Returns an array of Bio::PDB::Record::* objects. # # if _name_ is nil, returns hash storing all record data. # # Example: # p pdb.record('CONECT') # p pdb.record['CONECT'] # def record(name = nil) name ? @hash[name] : @hash end end #class ChemicalComponent end #class PDB end #module Bio From ngoto at pub.open-bio.org Sun Jan 29 01:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io/flatfile indexer.rb,1.21,1.22 Message-ID: <200601290654.k0T6sGVL007967@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io/flatfile In directory pub.open-bio.org:/tmp/cvs-serv7939/io/flatfile Modified Files: indexer.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: indexer.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/indexer.rb,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** indexer.rb 26 Sep 2005 13:00:08 -0000 1.21 --- indexer.rb 29 Jan 2006 06:54:14 -0000 1.22 *************** *** 81,84 **** --- 81,86 ---- when 'Bio::Blast::WU::Report_TBlast' BlastDefaultParser.new(Bio::Blast::WU::Report_TBlast, *arg) + when 'Bio::PDB::ChemicalComponent' + PDBChemicalComponentParser.new(Bio::PDB::ChemicalComponent, *arg) else raise 'unknown or unsupported format' *************** *** 437,440 **** --- 439,471 ---- end end #class BlastDefaultReportParser + + class PDBChemicalComponentParser < TemplateParser + NAMESTYLE = NameSpaces.new( + NameSpace.new( 'UNIQUE', Proc.new { |x| x.entry_id } ) + ) + PRIMARY = 'UNIQUE' + def initialize(klass, pri_name = nil, sec_names = nil) + super() + self.format = 'raw' + self.dbclass = Bio::PDB::ChemicalComponent + self.set_primary_namespace((pri_name or PRIMARY)) + unless sec_names then + sec_names = [] + @namestyle.each_value do |x| + sec_names << x.name if x.name != self.primary.name + end + end + self.add_secondary_namespaces(*sec_names) + end + def open_flatfile(fileid, file) + super + @flatfile.pos = 0 + begin + pos = @flatfile.pos + line = @flatfile.gets + end until (!line or line =~ /^RESIDUE /) + @flatfile.pos = pos + end + end #class PDBChemicalComponentParser end #module Parser From nakao at pub.open-bio.org Sun Jan 29 02:39:34 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sun, 29 Jan 2006 07:39:34 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.18,1.19 Message-ID: <200601290739.k0T7dYVL008081@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv8071/lib/bio Modified Files: reference.rb Log Message: * Added RDoc. Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** reference.rb 18 Dec 2005 16:58:58 -0000 1.18 --- reference.rb 29 Jan 2006 07:39:31 -0000 1.19 *************** *** 1,6 **** # ! # bio/reference.rb - journal reference class # ! # Copyright (C) 2001 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,22 ---- # ! # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001 ! # KATAYAMA Toshiaki ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Journal reference classes. ! # ! # == Examples ! # ! # == References ! # ! # ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,28 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # module Bio class Reference def initialize(hash) hash.default = '' --- 34,100 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # module Bio + # A class for journal reference information. + # + # === Examples + # + # hash = {'authors' => [ "Hoge, J.P.", "Fuga, F.B." ], 'title' => "Title of the study.", + # 'journal' => "Theor. J. Hoge", 'volume' => 12, 'issue' => 3, 'pages' => "123-145", + # 'year' => 2001, 'pubmed' => 12345678, 'medline' => 98765432, 'abstract' => "...", + # ''url' => "http://...", 'mesh' => [], 'affiliations' => []} + # ref = Bio::Reference.new(hash) + # + # # Formats in the BiBTeX style. + # ref.format("bibtex") + # + # # Short-cut for Bio::Reference#format("bibtex") + # ref.bibtex + # class Reference + # Author names in an Array, [ "Hoge, J.P.", "Fuga, F.B." ]. + attr_reader :authors + + # "Title of the study." + attr_reader :title + + # "Theor. J. Hoge" + attr_reader :journal + + # 12 + attr_reader :volume + + # 3 + attr_reader :issue + + # "123-145" + attr_reader :pages + + # 2001 + attr_reader :year + + # 12345678 + attr_reader :pubmed + + # 98765432 + attr_reader :medline + + # Abstract test in String. + attr_reader :abstract + + # A URL String. + attr_reader :url + + # MeSH terms in an Array. + attr_reader :mesh + + # Affiliations in an Array. + attr_reader :affiliations + + # def initialize(hash) hash.default = '' *************** *** 44,50 **** @affiliations = [] if @affiliations.empty? end - attr_reader :authors, :title, :journal, :volume, :issue, :pages, :year, - :pubmed, :medline, :abstract, :url, :mesh, :affiliations def format(style = nil, option = nil) case style --- 116,136 ---- @affiliations = [] if @affiliations.empty? end + # Formats the reference in a given style. + # + # Styles: + # 0. nil - general + # 1. endnote - Endnote + # 2. bibitem - Bibitem (option acceptable) + # 3. bibtex - BiBTeX (option acceptable) + # 4. rd - rd (option acceptable) + # 5. nature - Nature (option acceptable) + # 6. science - Science + # 7. genome_biol - Genome Biology + # 8. genome_res - Genome Research + # 9. nar - Nucleic Acids Research + # 10. current - Current Biology + # 11. trends - Trends in * + # 12. cell - Cell Press def format(style = nil, option = nil) case style *************** *** 78,81 **** --- 164,168 ---- end + # Formats in the Endonote style. def endnote lines = [] *************** *** 105,108 **** --- 192,196 ---- end + # Formats in the bibitem. def bibitem(item = nil) item = "PMID:#{@pubmed}" unless item *************** *** 116,119 **** --- 204,208 ---- end + # Formats in the BiBTeX style. def bibtex(section = nil) section = "article" unless section *************** *** 133,136 **** --- 222,226 ---- end + # Formats in a general style. def general authors = @authors.join(', ') *************** *** 138,141 **** --- 228,232 ---- end + # Formats in the RD style. def rd(str = nil) @abstract ||= str *************** *** 148,151 **** --- 239,244 ---- end + # Formats in the Nature Publish Group style. + # * http://www.nature.com def nature(short = false) if short *************** *** 164,167 **** --- 257,262 ---- end + # Formats in the Science style. + # * http://www.siencemag.com/ def science if @authors.size > 4 *************** *** 174,177 **** --- 269,274 ---- end + # Formats in the Genome Biology style. + # * http://genomebiology.com/ def genome_biol authors = @authors.collect {|name| strip_dots(name)}.join(', ') *************** *** 179,184 **** --- 276,285 ---- "#{authors}: #{@title} #{journal} #{@year}, #{@volume}:#{@pages}." end + # Formats in the Current Biology style. + # * http://www.current-biology.com/ alias current genome_biol + # Formats in the Genome Research style. + # * http://genome.org/ def genome_res authors = authors_join(' and ') *************** *** 186,189 **** --- 287,292 ---- end + # Formats in the Nucleic Acids Reseach style. + # * http://nar.oxfordjournals.org/ def nar authors = authors_join(' and ') *************** *** 191,199 **** end def cell authors = authors_join(' and ') "#{authors} (#{@year}). #{@title} #{@journal} #{@volume}, #{pages}." end ! def trends if @authors.size > 2 --- 294,306 ---- end + # Formats in the CELL Press style. + # http://www.cell.com/ def cell authors = authors_join(' and ') "#{authors} (#{@year}). #{@title} #{@journal} #{@volume}, #{pages}." end ! ! # Formats in the TRENDS Journals. ! # * http://www.trends.com/ def trends if @authors.size > 2 *************** *** 236,255 **** end ! class References def initialize(ary = []) @references = ary end - attr_accessor :references ! def append(a) ! @references.push(a) if a.is_a? Reference return self end def each ! @references.each do |x| ! yield x end end --- 343,377 ---- end ! # Set of Bio::Reference. ! # ! # === Examples ! # ! # refs = Bio::References.new ! # refs.append(Bio::Reference.new(hash)) ! # refs.each do |reference| ! # ... ! # end ! # class References + # Array of Bio::Reference. + attr_accessor :references + + # def initialize(ary = []) @references = ary end ! ! # Append a Bio::Reference object. ! def append(reference) ! @references.push(reference) if a.is_a? Reference return self end + # Iterates each Bio::Reference object. def each ! @references.each do |reference| ! yield reference end end *************** *** 258,308 **** end - - - - =begin - - = Bio::Reference - - --- Bio::Reference.new(hash) - - --- Bio::Reference#authors -> Array - --- Bio::Reference#title -> String - --- Bio::Reference#journal -> String - --- Bio::Reference#volume -> Fixnum - --- Bio::Reference#issue -> Fixnum - --- Bio::Reference#pages -> String - --- Bio::Reference#year -> Fixnum - --- Bio::Reference#pubmed -> Fixnum - --- Bio::Reference#medline -> Fixnum - --- Bio::Reference#abstract -> String - --- Bio::Reference#url -> String - --- Bio::Reference#mesh -> Array - --- Bio::Reference#affiliations -> Array - - --- Bio::Reference#format(style = nil, option = nil) -> String - - --- Bio::Reference#endnote - --- Bio::Reference#bibitem(item = nil) -> String - --- Bio::Reference#bibtex(section = nil) -> String - --- Bio::Reference#rd(str = nil) -> String - --- Bio::Reference#nature(short = false) -> String - --- Bio::Reference#science -> String - --- Bio::Reference#genome_biol -> String - --- Bio::Reference#genome_res -> String - --- Bio::Reference#nar -> String - --- Bio::Reference#cell -> String - --- Bio::Reference#trends -> String - --- Bio::Reference#general -> String - - = Bio::References - - --- Bio::References.new(ary = []) - - --- Bio::References#references -> Array - --- Bio::References#append(a) -> Bio::References - --- Bio::References#each -> Array - - =end - --- 380,382 ---- From ngoto at pub.open-bio.org Sun Jan 29 05:06:45 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 10:06:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io/flatfile index.rb,1.15,1.16 Message-ID: <200601291006.k0TA6jVL017433@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io/flatfile In directory pub.open-bio.org:/tmp/cvs-serv17423 Modified Files: index.rb Log Message: added RDoc (still incomplete) Index: index.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** index.rb 28 Nov 2005 05:08:26 -0000 1.15 --- index.rb 29 Jan 2006 10:06:43 -0000 1.16 *************** *** 1,7 **** # ! # bio/io/flatfile/index.rb - OBDA flatfile index ! # ! # Copyright (C) 2002 GOTO Naohisa # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,12 ---- # ! # = bio/io/flatfile/index.rb - OBDA flatfile index # + # Copyright:: Copyright (C) 2002 + # GOTO Naohisa + # License:: LGPL + # + # $Id$ + # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 17,27 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ ! # require 'bio/io/flatfile/indexer' module Bio class FlatFileIndex --- 22,83 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ # ! # = About Bio::FlatFileIndex ! # ! # Please refer documents of following classes. ! # Classes/modules marked '#' are internal use only. ! # ! # == Classes/modules in index.rb ! # * class Bio::FlatFileIndex ! # * class Bio::FlatFileIndex::Results ! # * module Bio::FlatFileIndex::DEBUG ! # * #module Bio::FlatFileIndex::Template ! # * #class Bio::FlatFileIndex::Template::NameSpace ! # * #class Bio::FlatFileIndex::FileID ! # * #class Bio::FlatFileIndex::FileIDs ! # * #module Bio::FlatFileIndex::Flat_1 ! # * #class Bio::FlatFileIndex::Flat_1::Record ! # * #class Bio::FlatFileIndex::Flat_1::FlatMappingFile ! # * #class Bio::FlatFileIndex::Flat_1::PrimaryNameSpace ! # * #class Bio::FlatFileIndex::Flat_1::SecondaryNameSpace ! # * #class Bio::FlatFileIndex::NameSpaces ! # * #class Bio::FlatFileIndex::DataBank ! # ! # == Classes/modules in indexer.rb ! # * module Bio::FlatFileIndex::Indexer ! # * #class Bio::FlatFileIndex::Indexer::NameSpace ! # * #class Bio::FlatFileIndex::Indexer::NameSpaces ! # * #module Bio::FlatFileIndex::Indexer::Parser ! # * #class Bio::FlatFileIndex::Indexer::Parser::TemplateParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::GenBankParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::GenPeptParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::EMBLParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::SPTRParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::FastaFormatParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLSequenceParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLClusterParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::BlastDefaultParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::PDBChemicalComponentParser ! # ! # == Classes/modules in bdb.rb ! # * #module Bio::FlatFileIndex::BDBDefault ! # * #class Bio::FlatFileIndex::BDBWrapper ! # * #module Bio::FlatFileIndex::BDB_1 ! # * #class Bio::FlatFileIndex::BDB_1::BDBMappingFile ! # * #class Bio::FlatFileIndex::BDB_1::PrimaryNameSpace ! # * #class Bio::FlatFileIndex::BDB_1::SecondaryNameSpace ! # ! # = References ! # * (()) ! # * (()) ! # require 'bio/io/flatfile/indexer' module Bio + + + # Bio::FlatFileIndex is a class for OBDA flatfile index. class FlatFileIndex *************** *** 31,38 **** --- 87,105 ---- autoload :BDB_1, 'bio/io/flatfile/bdb' + # magic string for flat/1 index MAGIC_FLAT = 'flat/1' + + # magic string for BerkeleyDB/1 index MAGIC_BDB = 'BerkeleyDB/1' ######################################################### + + # Opens existing databank. Databank is a directory which contains + # indexed files and configuration files. The type of the databank + # (flat or BerkeleyDB) are determined automatically. + # + # If block is given, the databank object is passed to the block. + # The databank will be automatically closed when the block terminates. + # def self.open(name) if block_given? then *************** *** 54,57 **** --- 121,130 ---- end + # Opens existing databank. Databank is a directory which contains + # indexed files and configuration files. The type of the databank + # (flat or BerkeleyDB) are determined automatically. + # + # Unlike +FlatFileIndex.open+, block is not allowed. + # def initialize(name) @db = DataBank.open(name) *************** *** 59,67 **** --- 132,149 ---- # common interface defined in registry.rb + # Searching databank and returns entry (or entries) as a string. + # Multiple entries (contatinated to one string) may be returned. + # Returns empty string if not found. + # def get_by_id(key) search(key).to_s end + #-- # original methods + #++ + + # Closes the databank. + # Returns nil. def close check_closed? *************** *** 70,73 **** --- 152,156 ---- end + # Returns true if already closed. Otherwise, returns false. def closed? if @db then *************** *** 78,81 **** --- 161,177 ---- end + # Set default namespaces. + # default_namespaces = nil + # means all namespaces in the databank. + # + # default_namespaces= [ str1, str2, ... ] + # means set default namespeces to str1, str2, ... + # + # Default namespaces specified in this method only affect + # #get_by_id, #search, and #include? methods. + # + # Default of default namespaces is nil (that is, all namespaces + # are search destinations by default). + # def default_namespaces=(names) if names then *************** *** 87,94 **** --- 183,194 ---- end + # Returns default namespaces. + # Returns an array of strings or nil. + # nil means all namespaces. def default_namespaces @names end + # Searching databank and returns a Bio::FlatFileIndex::Results object. def search(key) check_closed? *************** *** 100,103 **** --- 200,206 ---- end + # Searching only specified namespeces. + # Returns a Bio::FlatFileIndex::Results object. + # def search_namespaces(key, *names) check_closed? *************** *** 105,108 **** --- 208,214 ---- end + # Searching only primary namespece. + # Returns a Bio::FlatFileIndex::Results object. + # def search_primary(key) check_closed? *************** *** 110,113 **** --- 216,227 ---- end + # Searching databank. + # If some entries are found, returns an array of + # unique IDs (primary identifiers). + # If not found anything, returns nil. + # + # This method is useful when search result is very large and + # #search method is very slow. + # def include?(key) check_closed? *************** *** 124,127 **** --- 238,243 ---- end + # Same as #include?, but serching only specified namespaces. + # def include_in_namespaces?(key, *names) check_closed? *************** *** 134,137 **** --- 250,255 ---- end + # Same as #include?, but serching only primary namespace. + # def include_in_primary?(key) check_closed? *************** *** 144,147 **** --- 262,268 ---- end + # Returns names of namespaces defined in the databank. + # (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] ) + # def namespaces check_closed? *************** *** 151,154 **** --- 272,276 ---- end + # Returns name of primary namespace as a string. def primary_namespace check_closed? *************** *** 156,159 **** --- 278,282 ---- end + # Returns names of secondary namespaces as an array of strings. def secondary_namespaces check_closed? *************** *** 161,164 **** --- 284,295 ---- end + # Check consistency between the databank(index) and original flat files. + # + # If the original flat files are changed after creating + # the databank, raises RuntimeError. + # + # Note that this check only compares file sizes as + # described in the OBDA specification. + # def check_consistency check_closed? *************** *** 166,177 **** --- 297,323 ---- end + # If true is given, consistency checks will be performed every time + # accessing flatfiles. If nil/false, no checks are performed. + # + # By default, always_check_consistency is true. + # def always_check_consistency=(bool) @db.always_check=(bool) end + + # If true, consistency checks will be performed every time + # accessing flatfiles. If nil/false, no checks are performed. + # + # By default, always_check_consistency is true. + # def always_check_consistency(bool) @db.always_check end + #-- # private methods + #++ + + # If the databank is closed, raises IOError. def check_closed? @db or raise IOError, 'closed databank' *************** *** 179,186 **** --- 325,351 ---- private :check_closed? + #-- ######################################################### + #++ + # Results stores search results created by + # Bio::FlatFileIndex methods. + # + # Currently, this class inherits Hash, but internal + # structure of this class may be changed anytime. + # Only using methods described below are strongly recomended. + # class Results < Hash + # Add search results. + # "a + b" means "a OR b". + # * Example + # # I want to search 'ADH_IRON_1' OR 'ADH_IRON_2' + # db = Bio::FlatFIleIndex.new(location) + # a1 = db.search('ADH_IRON_1') + # a2 = db.search('ADH_IRON_2') + # # a1 and a2 are Bio::FlatFileIndex::Results objects. + # print a1 + a2 + # def +(a) raise 'argument must be Results class' unless a.is_a?(self.class) *************** *** 190,193 **** --- 355,368 ---- end + # Returns set intersection of results. + # "a * b" means "a AND b". + # * Example + # # I want to search 'HIS_KIN' AND 'human' + # db = Bio::FlatFIleIndex.new(location) + # hk = db.search('HIS_KIN') + # hu = db.search('human') + # # hk and hu are Bio::FlatFileIndex::Results objects. + # print hk * hu + # def *(a) raise 'argument must be Results class' unless a.is_a?(self.class) *************** *** 197,216 **** --- 372,428 ---- end + # Returns a string. (concatinated if multiple results exists). + # Same as to_a.join(''). + # def to_s self.values.join end + #-- #alias each_orig each + #++ + + # alias for each_value. alias each each_value + + # Iterates over each result (string). + # Same as to_a.each. + def each(&x) #:yields: str + each_value(&x) + end if false #dummy for RDoc + + #-- #alias to_a_orig to_a + #++ + + # alias for to_a. alias to_a values + # Returns an array of strings. + # If no search results are exist, returns an empty array. + # + def to_a; values; end if false #dummy for RDoc + + # Returns number of results. + # Same as to_a.size. + def size; end if false #dummy for RDoc + end #class Results ######################################################### + # Module for output debug messages. + # Default setting: If $DEBUG or $VERBOSE is true, output debug + # messages to STDERR; Otherwise, don't output messages. + # module DEBUG @@out = STDERR @@flag = nil + + # Set debug messages output destination. + # If true is given, outputs to STDERR. + # If nil is given, outputs nothing. + # This method affects ALL of FlatFileIndex related objects/methods. + # def self.out=(io) if io then *************** *** 224,230 **** --- 436,446 ---- @@out end + + # get current debug messeages output destination def self.out @@out end + + # prints debug messages def self.print(*arg) @@flag = true if $DEBUG or $VERBOSE *************** *** 235,239 **** --- 451,462 ---- ######################################################### + # Templates + # + # Internal use only. module Template + + # templates of namespace + # + # Internal use only. class NameSpace def filename *************** *** 276,279 **** --- 499,505 ---- end #module Template + # FileID class. + # + # Internal use only. class FileID def self.new_from_string(str) *************** *** 356,359 **** --- 582,588 ---- end #class FileID + # FileIDs class. + # + # Internal use only. class FileIDs < Array def initialize(prefix, hash) *************** *** 472,476 **** --- 701,712 ---- end #class FileIDs + # module for flat/1 databank + # + # Internal use only. module Flat_1 + + # Record class. + # + # Internal use only. class Record def initialize(str, size = nil) *************** *** 501,504 **** --- 737,743 ---- end #class Record + # FlatMappingFile class. + # + # Internal use only. class FlatMappingFile @@recsize_width = 4 *************** *** 786,789 **** --- 1025,1031 ---- end #class FlatMappingFile + # primary name space + # + # Internal use only. class PrimaryNameSpace < Template::NameSpace def mapping(filename) *************** *** 795,798 **** --- 1037,1043 ---- end #class PrimaryNameSpace + # secondary name space + # + # Internal use only. class SecondaryNameSpace < Template::NameSpace def mapping(filename) *************** *** 811,815 **** end #module Flat_1 ! class NameSpaces < Hash def initialize(dbname, nsclass, arg) --- 1056,1062 ---- end #module Flat_1 ! # namespaces ! # ! # Internal use only. class NameSpaces < Hash def initialize(dbname, nsclass, arg) *************** *** 873,876 **** --- 1120,1126 ---- end #class NameSpaces + # databank + # + # Internal use only. class DataBank def self.file2hash(fileobj) *************** *** 1136,1308 **** end #module Bio - ###################################################################### - - =begin - - = Bio::FlatFileIndex - - --- Bio::FlatFileIndex.new(dbname) - --- Bio::FlatFileIndex.open(dbname) - - Opens existing databank. Databank is a directory which contains - indexed files and configuration files. The type of the databank - (flat or BerkeleyDB) are determined automatically. - - --- Bio::FlatFileIndex#close - - Closes opened databank. - - --- Bio::FlatFileIndex#closed? - - Returns true if already closed. Otherwise, returns false. - - --- Bio::FlatFileIndex#get_by_id(key) - - Common interface defined in registry.rb. - Searching databank and returns entry (or entries) as a string. - Multiple entries (contatinated to one string) may be returned. - Returns empty string If not found. - - --- Bio::FlatFileIndex#search(key) - - Searching databank and returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#include?(key) - - Searching databank. - If found, returns an array of unique IDs (primary identifiers). - If not found, returns nil. - - --- Bio::FlatFileIndex#search_primary(key) - - Searching only primary namespece. - Returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#search_namespaces(key, name1, name2, ...) - - Searching only specific namespeces. - Returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#include_in_primary?(key) - - Same as #include?, but serching only primary namespace. - - --- Bio::FlatFileIndex#include_in_namespaces?(key, name1, name2, ...) - - Same as #include?, but serching only specific namespaces. - - --- Bio::FlatFileIndex#namespaces - - Returns names of namespaces defined in the databank. - (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] ) - - --- Bio::FlatFileIndex#primary_namespace - - Returns name of primary namespace. - - --- Bio::FlatFileIndex#secondary_namespaces - - Returns names of secondary namespaces. - - --- Bio::FlatFileIndex#default_namespaces= [ str1, str2, ... ] - --- Bio::FlatFileIndex#default_namespaces= nil - - Set default namespaces. - nil means all namespaces in the databank. - Default namespaces specified in this method only affect - #get_by_id, #search, and #include? methods. - Default of default namespaces is nil (that is, all namespaces - are search destinations by default). - - --- Bio::FlatFileIndex#default_namespaces - - Returns default namespaces. - nil means all namespaces. - - --- Bio::FlatFileIndex#check_consistency - - Raise RuntimeError if flatfiles are changed after creating - the databank. (This check only compare file sizes as - described in the OBDA specification.) - - --- Bio::FlatFileIndex#always_check_consistency=(bool) - --- Bio::FlatFileIndex#always_check_consistency - - If true, consistency checks are performed every time - accessing flatfiles. If nil/false, no checks are performed. - Default of always_check_consistency is true. - - == Bio::FlatFileIndex::Results - - This object is made by Bio::FlatFileIndex methods. - Currently, this class inherits Hash, but internal - structure of this class may be changed anytime. - Only using methods described below are strongly recomended. - - --- Bio::FlatFileIndex::Results#to_a - - Returns an array of strings. - If no search results are exist, returns an empty array. - - --- Bio::FlatFileIndex::Results#each - - Iterates over each result(string). - Same as to_a.each. - - --- Bio::FlatFileIndex::Results#to_s - - Returns a string. (concatinated if multiple results exists). - Same as to_a.join(''). - - --- Bio::FlatFileIndex::Results#size - - Returns number of results. - Same as to_a.size. - - --- Bio::FlatFileIndex::Results#+(res) - - Add search results. - "a + b" means "a OR b". - * Example - # I want to search 'ADH_IRON_1' OR 'ADH_IRON_2' - db = Bio::FlatFIleIndex.new(location) - a1 = db.search('ADH_IRON_1') - a2 = db.search('ADH_IRON_2') - # a1 and a2 are Bio::FlatFileIndex::Results objects. - print a1 + a2 - - --- Bio::FlatFileIndex::Results#*(res) - - Returns set intersection of results. - "a * b" means "a AND b". - * Example - # I want to search 'HIS_KIN' AND 'human' - db = Bio::FlatFIleIndex.new(location) - hk = db.search('HIS_KIN') - hu = db.search('human') - # hk and hu are Bio::FlatFileIndex::Results objects. - print hk * hu - - == Bio::FlatFileIndex::DEBUG - - Module for output debug messages. - Default setting: If $DEBUG or $VERBOSE is true, output debug - messages to STDERR; Otherwise, don't output messages. - - --- Bio::FlatFileIndex::DEBUG.out=(io) - - Set debug messages output destination. - If true is given, outputs to STDERR. - If nil is given, outputs nothing. - This method affects ALL of FlatFileIndex related objects/methods. - - == Other classes/modules - - Classes/modules not described in this file are internal use only. - - == SEE ALSO - - * (()) - * (()) - - =end --- 1386,1387 ---- From pjotr at pub.open-bio.org Tue Jan 31 02:27:54 2006 From: pjotr at pub.open-bio.org (Pjotr Prins) Date: Tue, 31 Jan 2006 07:27:54 +0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.9,1.10 Message-ID: <200601310727.k0V7RsVL025386@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv25376 Modified Files: Tutorial.rd Log Message: Better example Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** Tutorial.rd 1 Nov 2005 04:31:48 -0000 1.9 --- Tutorial.rd 31 Jan 2006 07:27:52 -0000 1.10 *************** *** 1,11 **** =begin ! $Id$ ! Copyright (C) 2001-2003 KATAYAMA Toshiaki Translated into English: Naohisa Goto ! Edited by: PjotrPrins NOTE: This page is a work in progress at this point --- 1,14 ---- =begin ! See the document in the CVS repository ./doc/(()) - for a potentially more up-to-date edition. This one was updated: ! $Id$ Translated into English: Naohisa Goto ! Editor: PjotrPrins ! ! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2006 all ! others NOTE: This page is a work in progress at this point *************** *** 115,121 **** s = 'abc' ! puts s[0..0] - >a So when using String methods, you should subtract 1 from positions --- 118,129 ---- s = 'abc' ! puts s[0].chr ! ! >a ! ! puts s[0..1] ! ! >ab So when using String methods, you should subtract 1 from positions From pjotr at pub.open-bio.org Tue Jan 31 02:45:24 2006 From: pjotr at pub.open-bio.org (Pjotr Prins) Date: Tue, 31 Jan 2006 07:45:24 +0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.10,1.11 Message-ID: <200601310745.k0V7jOVL025523@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv25513/doc Modified Files: Tutorial.rd Log Message: tabs to spaces Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** Tutorial.rd 31 Jan 2006 07:27:52 -0000 1.10 --- Tutorial.rd 31 Jan 2006 07:45:22 -0000 1.11 *************** *** 118,124 **** s = 'abc' ! puts s[0].chr ! >a puts s[0..1] --- 118,124 ---- s = 'abc' ! puts s[0].chr ! >a puts s[0..1] From ngoto at pub.open-bio.org Thu Jan 26 16:04:06 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu, 26 Jan 2006 16:04:06 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/pdb - New directory Message-ID: <200601261604.k0QG46VL031055@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv31045/pdb Log Message: Directory /home/repository/bioruby/bioruby/test/unit/bio/db/pdb added to the repository From ngoto at pub.open-bio.org Thu Jan 26 16:06:05 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Thu, 26 Jan 2006 16:06:05 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/db/pdb test_pdb.rb,NONE,1.1 Message-ID: <200601261606.k0QG65VL031084@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv31072 Added Files: test_pdb.rb Log Message: Newly added unit test of Bio::PDB::* classes. Under construction. It is still very poor. --- NEW FILE: test_pdb.rb --- # # = test/unit/bio/db/pdb/test_pdb.rb - Unit test for Bio::PDB classes # # Copyright:: Copyright (C) 2006 # Naohisa Goto # # License:: LGPL # # $Id: test_pdb.rb,v 1.1 2006/01/26 16:06:03 ngoto Exp $ # #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #++ # # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 4, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio' module Bio #class TestPDB < Test::Unit::TestCase #end #class TestPDB module TestPDBRecord # test of Bio::PDB::Record::ATOM class TestATOM < Test::Unit::TestCase def setup # the data is taken from # http://www.rcsb.org/pdb/file_formats/pdb/pdbguide2.2/part_62.html @str = 'ATOM 154 CG2BVAL A 25 29.909 16.996 55.922 0.72 13.25 A1 C ' @atom = Bio::PDB::Record::ATOM.new.initialize_from_string(@str) end def test_record_name assert_equal('ATOM', @atom.record_name) end def test_serial assert_equal(154, @atom.serial) end def test_name assert_equal(' CG2', @atom.name) end def test_altLoc assert_equal('B', @atom.altLoc) end def test_resName assert_equal('VAL', @atom.resName) end def test_chainID assert_equal('A', @atom.chainID) end def test_resSeq assert_equal(25, @atom.resSeq) end def test_iCode assert_equal(' ', @atom.iCode) end def test_x assert_in_delta(29.909, @atom.x, Float::EPSILON) end def test_y assert_in_delta(16.996, @atom.y, Float::EPSILON) end def test_z assert_in_delta(55.922, @atom.z, Float::EPSILON) end def test_occupancy assert_in_delta(0.72, @atom.occupancy, Float::EPSILON) end def test_tempFactor assert_in_delta(13.25, @atom.tempFactor, Float::EPSILON) end def test_segID assert_equal('A1 ', @atom.segID) end def test_element assert_equal(' C', @atom.element) end def test_charge assert_equal(' ', @atom.charge) end def test_xyz assert_equal(Bio::PDB::Coordinate[ "29.909".to_f, "16.996".to_f, "55.922".to_f ], @atom.xyz) end def test_to_a assert_equal([ "29.909".to_f, "16.996".to_f, "55.922".to_f ], @atom.to_a) end def test_comparable a = Bio::PDB::Record::ATOM.new a.serial = 999 assert_equal(-1, @atom <=> a) a.serial = 154 assert_equal( 0, @atom <=> a) a.serial = 111 assert_equal( 1, @atom <=> a) end def test_to_s assert_equal(@str + "\n", @atom.to_s) end def test_original_data assert_equal([ @str ], @atom.original_data) end def test_do_parse assert_equal(@atom, @atom.do_parse) end def test_residue assert_equal(nil, @atom.residue) end def test_sigatm assert_equal(nil, @atom.sigatm) end def test_anisou assert_equal(nil, @atom.anisou) end def test_ter assert_equal(nil, @atom.ter) end end #class TestATOM end #module TestPDBRecord end #module Bio From ngoto at pub.open-bio.org Sat Jan 28 04:23:44 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sat, 28 Jan 2006 04:23:44 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io flatfile.rb,1.41,1.42 Message-ID: <200601280423.k0S4NhVL004355@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv4345/io Modified Files: flatfile.rb Log Message: changed format autodetection for KEGG data (format was changed) Index: flatfile.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile.rb,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** flatfile.rb 1 Nov 2005 15:34:45 -0000 1.41 --- flatfile.rb 28 Jan 2006 04:23:41 -0000 1.42 *************** *** 413,428 **** Bio::KEGG::BRITE ! when /^ENTRY .+ KO\s*$/ Bio::KEGG::KO ! when /^ENTRY .+ Glycan\s*$/ Bio::KEGG::GLYCAN ! when /^ENTRY .+ (CDS|gene|.*RNA) / ! Bio::KEGG::GENES ! when /^ENTRY EC [0-9\.]+$/ Bio::KEGG::ENZYME ! when /^ENTRY C[A-Za-z0-9\._]+$/ Bio::KEGG::COMPOUND ! when /^ENTRY R[A-Za-z0-9\._]+$/ Bio::KEGG::REACTION when /^ENTRY [a-z]+$/ Bio::KEGG::GENOME --- 413,431 ---- Bio::KEGG::BRITE ! when /^ENTRY .+ KO\s*/ Bio::KEGG::KO ! when /^ENTRY .+ Glycan\s*/ Bio::KEGG::GLYCAN ! when /^ENTRY EC [0-9\.]+$/, ! /^ENTRY .+ Enzyme\s*/ Bio::KEGG::ENZYME ! when /^ENTRY C[A-Za-z0-9\._]+$/, ! /^ENTRY .+ Compound\s*/ Bio::KEGG::COMPOUND ! when /^ENTRY R[A-Za-z0-9\._]+$/, ! /^ENTRY .+ Reaction\s*/ Bio::KEGG::REACTION + when /^ENTRY .+ (CDS|gene|.*RNA) / + Bio::KEGG::GENES when /^ENTRY [a-z]+$/ Bio::KEGG::GENOME From nakao at pub.open-bio.org Sat Jan 28 06:40:41 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 06:40:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/embl common.rb, 1.8, 1.9 embl.rb, 1.25, 1.26 sptr.rb, 1.29, 1.30 swissprot.rb, 1.3, 1.4 trembl.rb, 1.3, 1.4 uniprot.rb, 1.1, 1.2 Message-ID: <200601280640.k0S6efVL004736@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/embl In directory pub.open-bio.org:/tmp/cvs-serv4726/lib/bio/db/embl Modified Files: common.rb embl.rb sptr.rb swissprot.rb trembl.rb uniprot.rb Log Message: * Updated RDoc. Index: sptr.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/sptr.rb,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** sptr.rb 2 Nov 2005 07:30:14 -0000 1.29 --- sptr.rb 28 Jan 2006 06:40:38 -0000 1.30 *************** *** 7,15 **** # $Id$ # ! # == UniProtKB/SwissProt and TrEMBL # ! # See the SWISS-PROT dicument file SPECLIST.TXT. # ! # == Example # #-- --- 7,34 ---- # $Id$ # ! # == Description ! # ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes. # ! # See the SWISS-PROT document file SPECLIST.TXT or UniProtKB/SwissProt ! # user manual. ! # ! # == Examples # ! # str = File.read("p53_human.swiss") ! # obj = Bio::SPTR.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == References ! # ! # * Swiss-Prot Protein knowledgebase. TrEMBL Computer-annotated supplement ! # to Swiss-Prot ! # http://au.expasy.org/sprot/ ! # ! # * UniProt ! # http://uniprot.org/ ! # ! # * The UniProtKB/SwissProt/TrEMBL User Manual ! # http://www.expasy.org/sprot/userman.html # #-- *************** *** 37,41 **** module Bio ! # Parser class for UniProtKB/SwissProt and TrEMBL database entry class SPTR < EMBLDB include Bio::EMBLDB::Common --- 56,60 ---- module Bio ! # Parser class for UniProtKB/SwissProt and TrEMBL database entry. class SPTR < EMBLDB include Bio::EMBLDB::Common *************** *** 46,60 **** # returns a Hash of the ID line. # returns a content (Int or String) of the ID line by a given key. # Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH'] # ! # ID Line ! # "ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}." # - # ENTRY_NAME := "#{X}_#{Y}" - # X =~ /[A-Z0-9]{1,5}/ # The protein name. - # Y =~ /[A-Z0-9]{1,5}/ # The biological source of the protein. - # MOLECULE_TYPE := 'PRT' =~ /\w{3}/ - # SEQUENCE_LENGTH =~ /\d+ AA/ def id_line(key = nil) unless @data['ID'] --- 65,81 ---- # returns a Hash of the ID line. + # # returns a content (Int or String) of the ID line by a given key. # Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH'] # ! # === ID Line ! # ID P53_HUMAN STANDARD; PRT; 393 AA. ! # #"ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}." ! # ! # === Examples ! # obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD", "SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"} ! # ! # obj.id_line('ENTRY_NAME') #=> "P53_HUMAN" # def id_line(key = nil) unless @data['ID'] *************** *** 79,83 **** # returns a ENTRY_NAME in the ID line. # - # A short-cut for Bio::SPTR#id_line('ENTRY_NAME'). def entry_id id_line('ENTRY_NAME') --- 100,103 ---- *************** *** 120,127 **** # returns a String of information in the DT lines by a given key.. # ! # DT Line; date (3/entry) ! # DT DD-MMM-YYY (rel. NN, Created) ! # DT DD-MMM-YYY (rel. NN, Last sequence update) ! # DT DD-MMM-YYY (rel. NN, Last annotation update) def dt(key = nil) unless @data['DT'] --- 140,147 ---- # returns a String of information in the DT lines by a given key.. # ! # === DT Line; date (3/entry) ! # DT DD-MMM-YYY (rel. NN, Created) ! # DT DD-MMM-YYY (rel. NN, Last sequence update) ! # DT DD-MMM-YYY (rel. NN, Last annotation update) def dt(key = nil) unless @data['DT'] *************** *** 144,148 **** # returns the proposed official name of the protein. # ! # DE Line; description (>=1) # "DE #{OFFICIAL_NAME} (#{SYNONYM})" # "DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]." --- 164,168 ---- # returns the proposed official name of the protein. # ! # === DE Line; description (>=1) # "DE #{OFFICIAL_NAME} (#{SYNONYM})" # "DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]." *************** *** 193,197 **** # * Bio::SPTR#gn[0] -> Array # OR # ! # GN Line: Gene name(s) (>=0, optional) def gn return @data['GN'] if @data['GN'] --- 213,217 ---- # * Bio::SPTR#gn[0] -> Array # OR # ! # === GN Line: Gene name(s) (>=0, optional) def gn return @data['GN'] if @data['GN'] *************** *** 206,210 **** # returns contents in the old style GN line. ! # GN Line: Gene name(s) (>=0, optional) # GN HNS OR DRDX OR OSMZ OR BGLY. # GN CECA1 AND CECA2. --- 226,230 ---- # returns contents in the old style GN line. ! # === GN Line: Gene name(s) (>=0, optional) # GN HNS OR DRDX OR OSMZ OR BGLY. # GN CECA1 AND CECA2. *************** *** 293,297 **** # * Bio::EPTR#os(0) -> "Homo sapiens (Human)" # ! # OS Line; organism species (>=1) # OS Genus species (name). # OS Genus species (name0) (name1). --- 313,317 ---- # * Bio::EPTR#os(0) -> "Homo sapiens (Human)" # ! # === OS Line; organism species (>=1) # OS Genus species (name). # OS Genus species (name0) (name1). *************** *** 338,344 **** # returns a Hash of oraganism taxonomy cross-references. # * Bio::SPTR#ox -> Hash ! # {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...} # ! # OX Line; organism taxonomy cross-reference (>=1 per entry) # OX NCBI_TaxID=1234; # OX NCBI_TaxID=1234, 2345, 3456, 4567; --- 358,364 ---- # returns a Hash of oraganism taxonomy cross-references. # * Bio::SPTR#ox -> Hash ! # {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...} # ! # === OX Line; organism taxonomy cross-reference (>=1 per entry) # OX NCBI_TaxID=1234; # OX NCBI_TaxID=1234, 2345, 3456, 4567; *************** *** 369,409 **** # returns contents in the CC lines. # * Bio::SPTR#cc -> Hash ! ! # * Bio::SPTR#cc(Int) -> String ! # returns an Array of contents in the TOPIC string. # * Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash # # returns contents of the "ALTERNATIVE PRODUCTS". # * Bio::SPTR#cc('ALTERNATIVE PRODUCTS') -> Hash ! # {'Event' => str, ! # 'Named isoforms' => int, ! # 'Comment' => str, ! # 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} # ! # CC -!- ALTERNATIVE PRODUCTS: ! # CC Event=Alternative splicing; Named isoforms=15; ! # ... ! # CC placentae isoforms. All tissues differentially splice exon 13; ! # CC Name=A; Synonyms=no del; ! # CC IsoId=P15529-1; Sequence=Displayed; # # returns contents of the "DATABASE". # * Bio::SPTR#cc('DATABASE') -> Array ! # [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] # ! # CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. # # returns contents of the "MASS SPECTROMETRY". # * Bio::SPTR#cc('MASS SPECTROMETRY') -> Array ! # [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] # ! # MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. # - # CC lines (>=0, optional) - # CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT - # CC IN LIVER, KIDNEY, LUNG AND BRAIN. - # - # CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK; - # CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK. def cc(tag = nil) unless @data['CC'] --- 389,429 ---- # returns contents in the CC lines. # * Bio::SPTR#cc -> Hash ! # ! # returns an object of contents in the TOPIC. # * Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash # # returns contents of the "ALTERNATIVE PRODUCTS". # * Bio::SPTR#cc('ALTERNATIVE PRODUCTS') -> Hash ! # {'Event' => str, ! # 'Named isoforms' => int, ! # 'Comment' => str, ! # 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} # ! # CC -!- ALTERNATIVE PRODUCTS: ! # CC Event=Alternative splicing; Named isoforms=15; ! # ... ! # CC placentae isoforms. All tissues differentially splice exon 13; ! # CC Name=A; Synonyms=no del; ! # CC IsoId=P15529-1; Sequence=Displayed; # # returns contents of the "DATABASE". # * Bio::SPTR#cc('DATABASE') -> Array ! # [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] # ! # CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. # # returns contents of the "MASS SPECTROMETRY". # * Bio::SPTR#cc('MASS SPECTROMETRY') -> Array ! # [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] # ! # CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX]. ! # ! # === CC lines (>=0, optional) ! # CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT ! # CC IN LIVER, KIDNEY, LUNG AND BRAIN. ! # ! # CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK; ! # CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK. # def cc(tag = nil) unless @data['CC'] *************** *** 542,546 **** # returns conteins in a line of the CC INTERACTION section. # ! # CC P46527:CDKN1B; NbExp=1; IntAct=EBI-359815, EBI-519280; def cc_interaction_parse(str) it = str.scan(/(.+?); NbExp=(.+?); IntAct=(.+?);/) --- 562,566 ---- # returns conteins in a line of the CC INTERACTION section. # ! # CC P46527:CDKN1B; NbExp=1; IntAct=EBI-359815, EBI-519280; def cc_interaction_parse(str) it = str.scan(/(.+?); NbExp=(.+?); IntAct=(.+?);/) *************** *** 556,562 **** # * Bio::EMBLDB#dr -> Hash w/in Array # ! # DR Line; defabases cross-reference (>=0) ! # a cross_ref pre one line ! # DR database_identifier; primary_identifier; secondary_identifier. @@dr_database_identifier = ['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', --- 576,582 ---- # * Bio::EMBLDB#dr -> Hash w/in Array # ! # === DR Line; defabases cross-reference (>=0) ! # DR database_identifier; primary_identifier; secondary_identifier. ! # a cross_ref pre one line @@dr_database_identifier = ['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', *************** *** 575,595 **** # returns conteins in the feature table. # * Bio::SPTR#ft -> Hash ! # {'feature_name' => [{'From' => str, 'To' => str, ! # 'Description' => str, 'FTId' => str}],...} # # returns an Array of the information about the feature_name in the feature table. # * Bio::SPTR#ft(feature_name) -> Array of Hash ! # [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...] # ! # FT Line; feature table data (>=0, optional) # ! # Col Data item ! # ----- ----------------- ! # 1- 2 FT ! # 6-13 Feature name ! # 15-20 `FROM' endpoint ! # 22-27 `TO' endpoint ! # 35-75 Description (>=0 per key) ! # ----- ----------------- def ft(feature_name = nil) unless @data['FT'] --- 595,615 ---- # returns conteins in the feature table. # * Bio::SPTR#ft -> Hash ! # {'feature_name' => [{'From' => str, 'To' => str, ! # 'Description' => str, 'FTId' => str}],...} # # returns an Array of the information about the feature_name in the feature table. # * Bio::SPTR#ft(feature_name) -> Array of Hash ! # [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...] # ! # == FT Line; feature table data (>=0, optional) # ! # Col Data item ! # ----- ----------------- ! # 1- 2 FT ! # 6-13 Feature name ! # 15-20 `FROM' endpoint ! # 22-27 `TO' endpoint ! # 35-75 Description (>=0 per key) ! # ----- ----------------- def ft(feature_name = nil) unless @data['FT'] *************** *** 693,699 **** # * Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length', 'CRC64'] # ! # SQ Line; sequence header (1/entry) ! # SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64; ! # SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64; # # MW, Dalton unit. --- 713,719 ---- # * Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length', 'CRC64'] # ! # === SQ Line; sequence header (1/entry) ! # SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64; ! # SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64; # # MW, Dalton unit. Index: uniprot.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/uniprot.rb,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** uniprot.rb 10 Sep 2005 23:43:35 -0000 1.1 --- uniprot.rb 28 Jan 2006 06:40:39 -0000 1.2 *************** *** 1,6 **** # ! # bio/db/embl/uniprot.rb - UniProt database class # ! # Copyright (C) 2005 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/uniprot.rb - UniProt database class # ! # Copyright:: Copyright (C) 2005 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for UniProtKB/SwissProt specific methods. ! # ! # UniProtKB/SwissProt specific methods are defined in this class. ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes are ! # defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("p53_human.swiss") ! # obj = Bio::UniProt.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == Referencees ! # ! # * UniProt ! # http://uniprot.org/ ! # ! # * The UniProtKB/SwissProt/TrEMBL User Manual ! # http://www.expasy.org/sprot/userman.html ! ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for SwissProt database entry. + # See also Bio::SPTR class. class UniProt < SPTR # Nothing to do (UniProt format is abstracted in SPTR) Index: swissprot.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/swissprot.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** swissprot.rb 23 Aug 2004 23:40:35 -0000 1.3 --- swissprot.rb 28 Jan 2006 06:40:38 -0000 1.4 *************** *** 1,6 **** # ! # bio/db/embl/swissprot.rb - SwissProt database class # ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/swissprot.rb - SwissProt database class # ! # Copyright:: Copyright (C) 2001, 2002 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for SwissProt specific methods. ! # ! # SwissProt (before UniProtKB/SwissProt) specific methods are defined in ! # this class. Shared methods for UniProtKB/SwissProt and TrEMBL classes ! # are defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("p53_human.swiss") ! # obj = Bio::SwissProt.new(str) ! # obj.entry_id #=> "P53_HUMAN" ! # ! # == Referencees ! # ! # * Swiss-Prot Protein knowledgebase ! # http://au.expasy.org/sprot/ ! # ! # * Swiss-Prot Protein Knowledgebase User Manual ! # http://au.expasy.org/sprot/userman.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for SwissProt database entry. + # See also Bio::SPTR class. class SwissProt < SPTR # Nothing to do (SwissProt format is abstracted in SPTR) Index: embl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/embl.rb,v retrieving revision 1.25 retrieving revision 1.26 diff -C2 -d -r1.25 -r1.26 *** embl.rb 2 Nov 2005 07:30:14 -0000 1.25 --- embl.rb 28 Jan 2006 06:40:38 -0000 1.26 *************** *** 8,23 **** # $Id$ # ! # == EMBL database entry ! # # # ! # == Example # ! # emb = Bio::EMBL.new($<.read) ! # emb.entry_id ! # emb.each_cds do |cds| ! # cds ! # end ! # emb.seq # #-- --- 8,31 ---- # $Id$ # ! # == Description # + # Parser class for EMBL database entry. # ! # == Examples # ! # emb = Bio::EMBL.new($<.read) ! # emb.entry_id ! # emb.each_cds do |cds| ! # cds # A CDS in feature table. ! # end ! # emb.seq #=> "ACGT..." ! # ! # == References ! # ! # * The EMBL Nucleotide Sequence Database ! # http://www.ebi.ac.uk/embl/ ! # ! # * The EMBL Nucleotide Sequence Database: Users Manual ! # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html # #-- Index: common.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/common.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** common.rb 2 Nov 2005 07:30:14 -0000 1.8 --- common.rb 28 Jan 2006 06:40:38 -0000 1.9 *************** *** 7,14 **** # $Id$ # ! # == EMBL style databases class # ! # This module defines a common framework among EMBL, SWISS-PROT, TrEMBL. ! # For more details, see the documentations in each embl/*.rb libraries. # # EMBL style format: --- 7,17 ---- # $Id$ # ! # == Description # ! # EMBL style databases class ! # ! # This module defines a common framework among EMBL, UniProtKB, SWISS-PROT, ! # TrEMBL. For more details, see the documentations in each embl/*.rb ! # libraries. # # EMBL style format: *************** *** 39,45 **** # // - termination line (ends each entry; 1 per entry) # ! # ! # == Example # # require 'bio/db/embl/common' # module Bio --- 42,48 ---- # // - termination line (ends each entry; 1 per entry) # ! # == Examples # + # # Make a new parser class for EMBL style database entry. # require 'bio/db/embl/common' # module Bio *************** *** 48,51 **** --- 51,72 ---- # end # end + # + # == References + # + # * The EMBL Nucleotide Sequence Database + # http://www.ebi.ac.uk/embl/ + # + # * The EMBL Nucleotide Sequence Database: Users Manual + # http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html + # + # * Swiss-Prot Protein knowledgebase. TrEMBL Computer-annotated supplement + # to Swiss-Prot + # http://au.expasy.org/sprot/ + # + # * UniProt + # http://uniprot.org/ + # + # * The UniProtKB/SwissProt/TrEMBL User Manual + # http://www.expasy.org/sprot/userman.html # #-- Index: trembl.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/embl/trembl.rb,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** trembl.rb 23 Aug 2004 23:40:35 -0000 1.3 --- trembl.rb 28 Jan 2006 06:40:38 -0000 1.4 *************** *** 1,6 **** # ! # bio/db/embl/trembl.rb - TrEMBL database class # ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,33 ---- # ! # = bio/db/embl/trembl.rb - TrEMBL database class # ! # Copyright:: Copyright (C) 2001, 2002 KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Name space for TrEMBL specific methods. ! # ! # UniProtKB/SwissProt specific methods are defined in this class. ! # Shared methods for UniProtKB/SwissProt and TrEMBL classes are ! # defined in Bio::SPTR class. ! # ! # == Examples ! # ! # str = File.read("Q2UNG2_ASPOR.trembl") ! # obj = Bio::TrEMBL.new(str) ! # obj.entry_id #=> "Q2UNG2_ASPOR" ! # ! # == Referencees ! # ! # * TrEMBL Computer-annotated supplement to Swiss-Prot ! # http://au.expasy.org/sprot/ ! # ! # * TrEMBL Computer-annotated supplement to Swiss-Prot User Manual ! # http://au.expasy.org/sprot/userman.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,22 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 45,49 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 25,28 **** --- 52,57 ---- module Bio + # Parser class for TrEMBL database entry. + # See also Bio::SPTR class. class TrEMBL < SPTR # Nothing to do (TrEMBL format is abstracted in SPTR) From k at pub.open-bio.org Sat Jan 28 06:46:45 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio shell.rb,1.11,1.12 Message-ID: <200601280646.k0S6kiVL004805@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio Modified Files: shell.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: shell.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/shell.rb,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** shell.rb 7 Dec 2005 05:12:06 -0000 1.11 --- shell.rb 28 Jan 2006 06:46:42 -0000 1.12 *************** *** 43,46 **** --- 43,47 ---- require 'bio/shell/plugin/obda' require 'bio/shell/plugin/keggapi' + require 'bio/shell/plugin/emboss' extend Ghost From k at pub.open-bio.org Sat Jan 28 06:46:44 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:44 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.59,1.60 Message-ID: <200601280646.k0S6kiVL004797@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory pub.open-bio.org:/tmp/cvs-serv4775/lib Modified Files: bio.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.59 retrieving revision 1.60 diff -C2 -d -r1.59 -r1.60 *** bio.rb 20 Jan 2006 09:57:08 -0000 1.59 --- bio.rb 28 Jan 2006 06:46:42 -0000 1.60 *************** *** 29,33 **** module Bio ! BIORUBY_VERSION = [0, 7, 1].extend(Comparable) ### Basic data types --- 29,33 ---- module Bio ! BIORUBY_VERSION = [0, 7, 2].extend(Comparable) ### Basic data types *************** *** 195,199 **** #end ! # autoload :EMBOSS, 'bio/appl/emboss' # use bio/command, improve autoload :PSORT, 'bio/appl/psort' --- 195,199 ---- #end ! autoload :EMBOSS, 'bio/appl/emboss' # use bio/command, improve autoload :PSORT, 'bio/appl/psort' From k at pub.open-bio.org Sat Jan 28 06:46:44 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:44 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/appl emboss.rb,1.2,1.3 Message-ID: <200601280646.k0S6kiVL004801@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/appl In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio/appl Modified Files: emboss.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: emboss.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/appl/emboss.rb,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** emboss.rb 8 Sep 2005 01:22:08 -0000 1.2 --- emboss.rb 28 Jan 2006 06:46:42 -0000 1.3 *************** *** 1,6 **** # ! # bio/appl/emboss.rb - EMBOSS wrapper # ! # Copyright (C) 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,16 ---- # ! # = bio/appl/emboss.rb - EMBOSS wrapper # ! # Copyright:: Copyright (C) 2002, 2005 ! # KATAYAMA Toshiaki ! # License:: LGPL ! # ! # $Id$ ! # ! # == References ! # ! # * http://www.emboss.org ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,68 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # module Bio ! class EMBOSS ! def initialize(cmd_line) ! @cmd_line = cmd_line + ' -stdout' ! end ! def exec ! begin ! @io = IO.popen(@cmd_line, "w+") ! @result = @io.read ! return @result ! ensure ! @io.close ! end ! end ! attr_reader :io, :result end ! end ! ! =begin ! ! = Bio::EMBOSS ! ! EMBOSS wrapper. ! #!/usr/bin/env ruby ! require 'bio' ! emboss = Bio::EMBOSS.new("getorf -sequence ~/xlrhodop -outseq stdout") ! puts emboss.exec ! --- Bio::EMBOSS.new(command_line) ! --- Bio::EMBOSS#exec ! --- Bio::EMBOSS#io ! --- Bio::EMBOSS#result ! === SEE ALSO ! * http://www.emboss.org - =end --- 28,79 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # module Bio ! autoload :Command, 'bio/command' ! class EMBOSS ! extend Bio::Command::Tools + def self.seqret(arg) + str = self.retrieve('seqret', arg) end ! def self.entret(arg) ! str = self.retrieve('entret', arg) ! end ! def initialize(cmd_line) ! @cmd_line = cmd_line + ' -stdout -auto' ! end ! def exec ! begin ! @io = IO.popen(@cmd_line, "w+") ! @result = @io.read ! return @result ! ensure ! @io.close ! end ! end ! attr_reader :io, :result ! private ! def self.retrieve(cmd, arg) ! cmd = [ cmd, arg, '-auto', '-stdout' ] ! str = '' ! call_command_local(cmd) do |inn, out| ! inn.close_write ! str = out.read ! end ! return str ! end ! end # EMBOSS ! end # Bio From k at pub.open-bio.org Sat Jan 28 06:46:45 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 06:46:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/shell/plugin entry.rb,1.4,1.5 Message-ID: <200601280646.k0S6kjVL004809@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/shell/plugin In directory pub.open-bio.org:/tmp/cvs-serv4775/lib/bio/shell/plugin Modified Files: entry.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) * obj() method is added in addition to seq() and ent() methods. Index: entry.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/shell/plugin/entry.rb,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** entry.rb 7 Dec 2005 05:12:07 -0000 1.4 --- entry.rb 28 Jan 2006 06:46:43 -0000 1.5 *************** *** 67,84 **** # * IO -- IO object (first entry only) # * "filename" -- local file (first entry only) ! # * "db:entry" -- local bioflat, OBDA, KEGG API def ent(arg) entry = "" db, entry_id = arg.to_s.strip.split(/:/) if arg.respond_to?(:gets) or File.exists?(arg) entry = flatfile(arg) elsif Bio::Shell.find_flat_dir(db) entry = flatsearch(db, entry_id) elsif obdadbs.include?(db) entry = obdaentry(db, entry_id) else ! entry = bget(arg) end return entry end --- 67,110 ---- # * IO -- IO object (first entry only) # * "filename" -- local file (first entry only) ! # * "db:entry" -- local BioFlat, OBDA, EMBOSS, KEGG API def ent(arg) entry = "" db, entry_id = arg.to_s.strip.split(/:/) + + # local file if arg.respond_to?(:gets) or File.exists?(arg) + puts "Retrieving entry from file (#{arg})" entry = flatfile(arg) + + # BioFlat in ./.bioruby/bioflat/ or ~/.bioinformatics/.bioruby/bioflat/ elsif Bio::Shell.find_flat_dir(db) + puts "Retrieving entry from local BioFlat database (#{arg})" entry = flatsearch(db, entry_id) + + # OBDA in ~/.bioinformatics/seqdatabase.ini elsif obdadbs.include?(db) + puts "Retrieving entry from OBDA (#{arg})" entry = obdaentry(db, entry_id) + else ! # EMBOSS USA in ~/.embossrc ! str = entret(arg) ! if $?.exitstatus == 0 and str.length != 0 ! puts "Retrieving entry from EMBOSS (#{arg})" ! entry = str ! ! # KEGG API at http://www.genome.jp/kegg/soap/ ! else ! puts "Retrieving entry from KEGG API (#{arg})" ! entry = bget(arg) ! end end + return entry + end + + def obj(arg) + str = ent(arg) + flatparse(str) end From k at pub.open-bio.org Sat Jan 28 07:22:16 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 07:22:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/shell/plugin emboss.rb,NONE,1.1 Message-ID: <200601280722.k0S7MGVL005006@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/shell/plugin In directory pub.open-bio.org:/tmp/cvs-serv5002/lib/bio/shell/plugin Added Files: emboss.rb Log Message: * entret/seqret commands in EMBOSS are supported * Bio::EMBOSS.seqret(usa), Bio::EMBOSS.entret(usa) * bioruby> seqret(usa), entret(usa) --- NEW FILE: emboss.rb --- # # = bio/shell/plugin/emboss.rb - methods to use EMBOSS # # Copyright:: Copyright (C) 2005 # Toshiaki Katayama # License:: LGPL # # $Id: emboss.rb,v 1.1 2006/01/28 07:22:14 k Exp $ # #-- # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # #++ # module Bio::Shell private def seqret(usa) Bio::EMBOSS.seqret(usa) end def entret(usa) Bio::EMBOSS.entret(usa) end end From nakao at pub.open-bio.org Sat Jan 28 07:42:01 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 07:42:01 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io fastacmd.rb,1.8,1.9 Message-ID: <200601280742.k0S7g1VL005071@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5061/lib/bio/io Modified Files: fastacmd.rb Log Message: * Added RDoc. Index: fastacmd.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/fastacmd.rb,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** fastacmd.rb 26 Sep 2005 13:00:08 -0000 1.8 --- fastacmd.rb 28 Jan 2006 07:41:59 -0000 1.9 *************** *** 1,7 **** # ! # bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright (C) 2005 Shuji SHIGENOBU ! # Copyright (C) 2005 Toshiaki Katayama # # This library is free software; you can redistribute it and/or --- 1,42 ---- # ! # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005 ! # Shuji SHIGENOBU , ! # Toshiaki Katayama ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Retrives FASTA formatted sequences from a blast database using ! # NCBI fastacmd command. ! # ! # == Examples ! # ! # database = ARGV.shift || "/db/myblastdb" ! # entry_id = ARGV.shift || "sp:128U_DROME" ! # ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! # ! # fastacmd = Bio::Blast::Fastacmd.new(database) ! # ! # entry = fastacmd.get_by_id(entry_id) ! # fastacmd.fetch(entry_id) ! # fastacmd.fetch(ent_list) ! # ! # fastacmd.fetch(ent_list).each do |fasta| ! # puts fasta ! # end ! # ! # == References ! # ! # * NCBI tool ! # ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/ncbi.tar.gz ! # ! # * fastacmd.html ! # http://biowulf.nih.gov/apps/blast/doc/fastacmd.html ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 19,23 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 54,58 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 29,32 **** --- 64,69 ---- class Blast + # NCBI fastacmd wrapper class + # class Fastacmd *************** *** 34,49 **** include Bio::Command::Tools ! def initialize(db) ! @database = db @fastacmd = 'fastacmd' end - attr_accessor :database, :fastacmd, :errorlog ! # get an entry_id and returns a Bio::FastaFormat object def get_by_id(entry_id) fetch(entry_id).shift end ! # get one or more entry_id and returns an Array of Bio::FastaFormat objects def fetch(list) if list.respond_to?(:join) --- 71,113 ---- include Bio::Command::Tools ! # Database file path. ! attr_accessor :database ! ! # fastcmd command file path. ! attr_accessor :fastacmd ! ! # ! attr_accessor :errorlog ! ! # Initalize a fastacmd object. ! # ! # fastacmd = Bio::Blast::Fastacmd.new("/db/myblastdb") ! def initialize(blast_database_file_path) ! @database = blast_database_file_path @fastacmd = 'fastacmd' end ! ! # get an entry_id and returns a Bio::FastaFormat object. ! # ! # entry_id = "sp:128U_DROME" ! # entry = fastacmd.get_by_id(entry_id) def get_by_id(entry_id) fetch(entry_id).shift end ! # get one or more entry_id and returns an Array of Bio::FastaFormat objects. ! # ! # Fastacmd#fetch(entry_id) returns an Array of a Bio::FastaFormat ! # object even when the result is a single entry. ! # ! # p fastacmd.fetch(entry_id) ! # ! # Fastacmd#fetch method also accepts a list of entry_id and returns ! # an Array of Bio::FastaFormat objects. ! # ! # ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! # p fastacmd.fetch(ent_list) ! # def fetch(list) if list.respond_to?(:join) *************** *** 60,63 **** --- 124,134 ---- end + # Iterates each entry. + # + # You can also iterate on all sequences in the database! + # fastacmd.each do |fasta| + # p [ fasta.definition[0..30], fasta.seq.size ] + # end + # def each_entry cmd = [ @fastacmd, '-d', @database, '-D', 'T' ] *************** *** 65,70 **** inn.close_write Bio::FlatFile.open(Bio::FastaFormat, out) do |f| ! f.each_entry do |e| ! yield e end end --- 136,141 ---- inn.close_write Bio::FlatFile.open(Bio::FastaFormat, out) do |f| ! f.each_entry do |entry| ! yield entry end end *************** *** 74,123 **** alias each each_entry ! end ! ! end ! end ! ! ! if __FILE__ == $0 ! ! database = ARGV.shift || "/db/myblastdb" ! entry_id = ARGV.shift || "sp:128U_DROME" ! ent_list = ["sp:1433_SPIOL", "sp:1432_MAIZE"] ! ! fastacmd = Bio::Blast::Fastacmd.new(database) ! ! ### Retrieve one sequence ! entry = fastacmd.get_by_id(entry_id) ! ! # Fastacmd#get_by_id(entry_id) returns a Bio::FastaFormat object. ! p entry ! ! # Bio::FastaFormat becomes a fasta format string when printed by puts. ! puts entry ! ! # Fastacmd#fetch(entry_id) returns an Array of a Bio::FastaFormat ! # object even when the result is a single entry. ! p fastacmd.fetch(entry_id) ! ! ### Retrieve more sequences ! ! # Fastacmd#fetch method also accepts a list of entry_id and returns ! # an Array of Bio::FastaFormat objects. ! p fastacmd.fetch(ent_list) ! ! # So, you can iterate on the results. ! fastacmd.fetch(ent_list).each do |fasta| ! puts fasta ! end ! ! ! ### Iterates on all entries ! # You can also iterate on all sequences in the database! ! fastacmd.each do |fasta| ! p [ fasta.definition[0..30], fasta.seq.size ] ! end - end --- 145,152 ---- alias each each_entry ! end # class Fastacmd ! end # class Blast ! end # module Bio From nakao at pub.open-bio.org Sat Jan 28 08:05:55 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 08:05:55 +0000 Subject: [BioRuby-cvs] bioruby/test/unit/bio/io test_fastacmd.rb,NONE,1.1 Message-ID: <200601280805.k0S85tVL005150@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/test/unit/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5138/test/unit/bio/io Added Files: test_fastacmd.rb Log Message: * Newly added. --- NEW FILE: test_fastacmd.rb --- # # test/unit/bio/io/test_fastacmd.rb - Unit test for Bio::Blast::Fastacmd. # # Copyright (C) 2006 Mitsuteru Nakao # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # $Id: test_fastacmd.rb,v 1.1 2006/01/28 08:05:52 nakao Exp $ # require 'pathname' libpath = Pathname.new(File.join(File.dirname(__FILE__), ['..'] * 4, 'lib')).cleanpath.to_s $:.unshift(libpath) unless $:.include?(libpath) require 'test/unit' require 'bio/io/fastacmd' module Bio class TestFastacmd < Test::Unit::TestCase def setup @obj = Bio::Blast::Fastacmd.new("/tmp/test") end def test_database assert_equal("/tmp/test", @obj.database) end def test_fastacmd assert_equal("fastacmd", @obj.fastacmd) end def test_methods method_list = ['get_by_id', 'fetch', 'each_entry', 'each'] method_list.each do |method| assert(@obj.methods.include?(method)) end end end end From nakao at pub.open-bio.org Sat Jan 28 08:12:23 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 08:12:23 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io fastacmd.rb,1.9,1.10 Message-ID: <200601280812.k0S8CNVL005196@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5186/lib/bio/io Modified Files: fastacmd.rb Log Message: * Updated RDoc. Index: fastacmd.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/fastacmd.rb,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** fastacmd.rb 28 Jan 2006 07:41:59 -0000 1.9 --- fastacmd.rb 28 Jan 2006 08:12:21 -0000 1.10 *************** *** 2,8 **** # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005 # Shuji SHIGENOBU , ! # Toshiaki Katayama # Lisence:: LGPL # --- 2,9 ---- # = bio/io/fastacmd.rb - NCBI fastacmd wrapper class # ! # Copyright:: Copyright (C) 2005, 2006 # Shuji SHIGENOBU , ! # Toshiaki Katayama , ! # Mitsuteru C. Nakao # Lisence:: LGPL # *************** *** 14,17 **** --- 15,21 ---- # NCBI fastacmd command. # + # This class requires 'fastacmd' command and a blast database + # (formatted using the '-o' option of 'formatdb'). + # # == Examples # From k at pub.open-bio.org Sat Jan 28 08:34:27 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 08:34:27 +0000 Subject: [BioRuby-cvs] bioruby/lib bio.rb,1.60,1.61 Message-ID: <200601280834.k0S8YRVL005291@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib In directory pub.open-bio.org:/tmp/cvs-serv5285/lib Modified Files: bio.rb Log Message: * Bio::BRDB is now removed Index: bio.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio.rb,v retrieving revision 1.60 retrieving revision 1.61 diff -C2 -d -r1.60 -r1.61 *** bio.rb 28 Jan 2006 06:46:42 -0000 1.60 --- bio.rb 28 Jan 2006 08:34:25 -0000 1.61 *************** *** 168,172 **** # autoload :ESOAP, 'bio/io/esoap' # NCBI::ESOAP ? - # autoload :BRDB, 'bio/io/brdb' # remove --- 168,171 ---- From k at pub.open-bio.org Sat Jan 28 08:34:27 2006 From: k at pub.open-bio.org (Katayama Toshiaki) Date: Sat, 28 Jan 2006 08:34:27 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io brdb.rb,1.4,NONE Message-ID: <200601280834.k0S8YRVL005295@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv5285/lib/bio/io Removed Files: brdb.rb Log Message: * Bio::BRDB is now removed --- brdb.rb DELETED --- From nakao at pub.open-bio.org Sat Jan 28 10:49:01 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sat, 28 Jan 2006 10:49:01 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db fasta.rb,1.21,1.22 Message-ID: <200601281049.k0SAn1VL005893@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db In directory pub.open-bio.org:/tmp/cvs-serv5883/lib/bio/db Modified Files: fasta.rb Log Message: * Added RDoc. Index: fasta.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/fasta.rb,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** fasta.rb 26 Sep 2005 13:00:06 -0000 1.21 --- fasta.rb 28 Jan 2006 10:48:59 -0000 1.22 *************** *** 1,7 **** # ! # bio/db/fasta.rb - FASTA format class # ! # Copyright (C) 2001 GOTO Naohisa ! # Copyright (C) 2001, 2002 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,67 ---- # ! # = bio/db/fasta.rb - FASTA format class # ! # Copyright:: Copyright (C) 2001, 2002 ! # GOTO Naohisa , ! # KATAYAMA Toshiaki ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # FASTA format class. ! # ! # == Examples ! # ! # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') ! # rub.entry_id ==> 'gi|671595' ! # rub.get('emb') ==> 'CAA85678.1' ! # rub.emb ==> 'CAA85678.1' ! # rub.gi ==> '671595' ! # rub.accession ==> 'CAA85678' ! # rub.accessions ==> [ 'CAA85678' ] ! # rub.acc_version ==> 'CAA85678.1' ! # rub.locus ==> nil ! # rub.list_ids ==> [["gi", "671595"], ! # ["emb", "CAA85678.1", nil], ! # ["Perovskia abrotanoides"]] ! # ! # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") ! # ckr.entry_id ==> "gi|2495000" ! # ckr.sp ==> "CCKR_CAVPO" ! # ckr.pir ==> "I51898" ! # ckr.gb ==> "AAB29504.1" ! # ckr.gi ==> "2495000" ! # ckr.accession ==> "AAB29504" ! # ckr.accessions ==> ["Q63931", "AAB29504"] ! # ckr.acc_version ==> "AAB29504.1" ! # ckr.locus ==> nil ! # ckr.description ==> ! # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" ! # ckr.descriptions ==> ! # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", ! # "cholecystokinin A receptor - guinea pig", ! # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] ! # ckr.words ==> ! # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", ! # "receptor", "type"] ! # ckr.id_strings ==> ! # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", ! # "544724", "AAB29504.1", "Cavia"] ! # ckr.list_ids ==> ! # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], ! # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], ! # ["gb", "AAB29504.1", nil], ["Cavia"]] ! # ! # == References ! # ! # * FASTA format (WikiPedia) ! # http://en.wikipedia.org/wiki/FASTA_format ! # ! # * Fasta format description (NCBI) ! # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 19,23 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # --- 79,83 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # *************** *** 27,34 **** --- 87,171 ---- module Bio + + # Treats a FASTA formatted entry, such as: + # + # >id and/or some comments <== comment line + # ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines + # ATGCATGCATGCATGCATGCATGCATGCATGCATGC + # ATGCATGCATGC + # + # The precedent '>' can be omitted and the trailing '>' will be removed + # automatically. + # + # === Examples + # + # f_str = <sce:YBR160W CDC28, SRM5; cyclin-dependent protein kinase catalytic subunit [EC:2.7.1.-] [SP:CC28_YEAST] + # MSGELANYKRLEKVGEGTYGVVYKALDLRPGQGQRVVALKKIRLESEDEG + # VPSTAIREISLLKELKDDNIVRLYDIVHSDAHKLYLVFEFLDLDLKRYME + # GIPKDQPLGADIVKKFMMQLCKGIAYCHSHRILHRDLKPQNLLINKDGNL + # KLGDFGLARAFGVPLRAYTHEIVTLWYRAPEVLLGGKQYSTGVDTWSIGC + # IFAEMCNRKPIFSGDSEIDQIFKIFRVLGTPNEAIWPDIVYLPDFKPSFP + # QWRRKDLSQVVPSLDPRGIDLLDKLLAYDPINRISARRAAIHPYFQES + # >sce:YBR274W CHK1; probable serine/threonine-protein kinase [EC:2.7.1.-] [SP:KB9S_YEAST] + # MSLSQVSPLPHIKDVVLGDTVGQGAFACVKNAHLQMDPSIILAVKFIHVP + # TCKKMGLSDKDITKEVVLQSKCSKHPNVLRLIDCNVSKEYMWIILEMADG + # GDLFDKIEPDVGVDSDVAQFYFQQLVSAINYLHVECGVAHRDIKPENILL + # DKNGNLKLADFGLASQFRRKDGTLRVSMDQRGSPPYMAPEVLYSEEGYYA + # DRTDIWSIGILLFVLLTGQTPWELPSLENEDFVFFIENDGNLNWGPWSKI + # EFTHLNLLRKILQPDPNKRVTLKALKLHPWVLRRASFSGDDGLCNDPELL + # AKKLFSHLKVSLSNENYLKFTQDTNSNNRYISTQPIGNELAELEHDSMHF + # QTVSNTQRAFTSYDSNTNYNSGTGMTQEAKWTQFISYDIAALQFHSDEND + # CNELVKRHLQFNPNKLTKFYTLQPMDVLLPILEKALNLSQIRVKPDLFAN + # FERLCELLGYDNVFPLIINIKTKSNGGYQLCGSISIIKIEEELKSVGFER + # KTGDPLEWRRLFKKISTICRDIILIPN + # END + # + # f = Bio::FastaFormat.new(f_str) + # puts "### FastaFormat" + # puts "# entry" + # puts f.entry + # puts "# entry_id" + # p f.entry_id + # puts "# definition" + # p f.definition + # puts "# data" + # p f.data + # puts "# seq" + # p f.seq + # puts "# seq.type" + # p f.seq.type + # puts "# length" + # p f.length + # puts "# aaseq" + # p f.aaseq + # puts "# aaseq.type" + # p f.aaseq.type + # puts "# aaseq.composition" + # p f.aaseq.composition + # puts "# aalen" + # p f.aalen + # + # === References + # + # * FASTA format (WikiPedia) + # http://en.wikipedia.org/wiki/FASTA_format + # class FastaFormat < DB + # Entry delimiter in flatfile text. DELIMITER = RS = "\n>" + # The comment line of the FASTA formatted data. + attr_accessor :definition + + # The seuqnce lines in text. + attr_accessor :data + + attr_reader :entry_overrun + + # Stores the comment and sequence information from one entry of the + # FASTA format string. If the argument contains more than one + # entry, only the first entry is used. def initialize(str) @definition = str[/.*/].sub(/^>/, '').strip # 1st line *************** *** 37,43 **** @entry_overrun = $& end - attr_accessor :definition, :data - attr_reader :entry_overrun def entry @entry = ">#{@definition}\n#{@data.strip}\n" --- 174,179 ---- @entry_overrun = $& end + # Returns the stored one entry as a FASTA format. (same as to_s) def entry @entry = ">#{@definition}\n#{@data.strip}\n" *************** *** 45,48 **** --- 181,202 ---- alias to_s entry + + # Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast + # factory object. + # + # #!/usr/bin/env ruby + # require 'bio' + # + # factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') + # flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') + # flatfile.each do |entry| + # p entry.definition + # result = entry.fasta(factory) + # result.each do |hit| + # print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " + # p hit.lap_at + # end + # end + # def query(factory) factory.query(@entry) *************** *** 51,54 **** --- 205,209 ---- alias blast query + # Returns a joined sequence line as a String. def seq unless defined?(@seq) *************** *** 76,79 **** --- 231,235 ---- end + # Returns comments. def comment seq *************** *** 81,104 **** --- 237,269 ---- end + # Returns sequence length. def length seq.length end + # Returens the Bio::Sequence::NA. def naseq Sequence::NA.new(seq) end + # Returens the length of Bio::Sequence::NA. def nalen self.naseq.length end + # Returens the Bio::Sequence::AA. def aaseq Sequence::AA.new(seq) end + # Returens the length of Bio::Sequence::AA. def aalen self.aaseq.length end + # Parsing FASTA Defline, and extract IDs. + # IDs are NSIDs (NCBI standard FASTA sequence identifiers) + # or ":"-separated IDs. + # It returns a Bio::FastaDefline instance. def identifiers unless defined?(@ids) then *************** *** 108,131 **** --- 273,310 ---- end + # Parsing FASTA Defline (using #identifiers method), and + # shows a possibly unique identifier. + # It returns a string. def entry_id identifiers.entry_id end + # Parsing FASTA Defline (using #identifiers method), and + # shows GI/locus/accession/accession with version number. + # If a entry has more than two of such IDs, + # only the first ID are shown. + # It returns a string or nil. def gi identifiers.gi end + # Returns an accession number. def accession identifiers.accession end + # Parsing FASTA Defline (using #identifiers method), and + # shows accession numbers. + # It returns an array of strings. def accessions identifiers.accessions end + # Returns accession number with version. def acc_version identifiers.acc_version end + # Returns locus. def locus identifiers.locus *************** *** 134,139 **** --- 313,339 ---- end #class FastaFormat + # Treats a FASTA formatted numerical entry, such as: + # + # >id and/or some comments <== comment line + # 24 15 23 29 20 13 20 21 21 23 22 25 13 <== numerical data + # 22 17 15 25 27 32 26 32 29 29 25 + # + # The precedent '>' can be omitted and the trailing '>' will be removed + # automatically. + # + # --- Bio::FastaNumericFormat.new(entry) + # + # Stores the comment and the list of the numerical data. + # + # --- Bio::FastaNumericFormat#definition + # + # The comment line of the FASTA formatted data. + # + # * FASTA format (Wikipedia) + # http://en.wikipedia.org/wiki/FASTA_format class FastaNumericFormat < FastaFormat + # Returns the list of the numerical data (typically the quality score + # of its corresponding sequence) as an Array. def data unless @list *************** *** 143,150 **** --- 343,352 ---- end + # Returns the number of elements in the numerical data. def length data.length end + # Yields on each elements of the numerical data. def each data.each do |x| *************** *** 153,156 **** --- 355,359 ---- end + # Returns the n-th element. def [](n) data[n] *************** *** 161,169 **** end #class FastaNumericFormat - class FastaDefline ! # specs are described in: ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers NSIDs = { --- 364,430 ---- end #class FastaNumericFormat ! # Parsing FASTA Defline, and extract IDs and other informations. ! # IDs are NSIDs (NCBI standard FASTA sequence identifiers) ! # or ":"-separated IDs. ! # ! # specs are described in: ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers ! # ! # === Examples ! # ! # rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') ! # rub.entry_id ==> 'gi|671595' ! # rub.get('emb') ==> 'CAA85678.1' ! # rub.emb ==> 'CAA85678.1' ! # rub.gi ==> '671595' ! # rub.accession ==> 'CAA85678' ! # rub.accessions ==> [ 'CAA85678' ] ! # rub.acc_version ==> 'CAA85678.1' ! # rub.locus ==> nil ! # rub.list_ids ==> [["gi", "671595"], ! # ["emb", "CAA85678.1", nil], ! # ["Perovskia abrotanoides"]] ! # ! # ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") ! # ckr.entry_id ==> "gi|2495000" ! # ckr.sp ==> "CCKR_CAVPO" ! # ckr.pir ==> "I51898" ! # ckr.gb ==> "AAB29504.1" ! # ckr.gi ==> "2495000" ! # ckr.accession ==> "AAB29504" ! # ckr.accessions ==> ["Q63931", "AAB29504"] ! # ckr.acc_version ==> "AAB29504.1" ! # ckr.locus ==> nil ! # ckr.description ==> ! # "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" ! # ckr.descriptions ==> ! # ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", ! # "cholecystokinin A receptor - guinea pig", ! # "cholecystokinin A receptor; CCK-A receptor [Cavia]"] ! # ckr.words ==> ! # ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", ! # "receptor", "type"] ! # ckr.id_strings ==> ! # ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", ! # "544724", "AAB29504.1", "Cavia"] ! # ckr.list_ids ==> ! # [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], ! # ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], ! # ["gb", "AAB29504.1", nil], ["Cavia"]] ! # ! # === Refereneces ! # ! # * Fasta format description (NCBI) ! # http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml ! # ! # * Frequently Asked Questions: Indexing of Sequence Identifiers (by Warren R. Gish.) ! # http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers ! # ! # * README.formatdb ! # ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb ! # ! class FastaDefline NSIDs = { *************** *** 198,201 **** --- 459,471 ---- } + # Shows array that contains IDs (or ID-like strings). + # Returns an array of arrays of strings. + attr_reader :list_ids + + # Shows a possibly unique identifier. + # Returns a string. + attr_reader :entry_id + + # Parses given string. def initialize(str) @deflines = [] *************** *** 211,217 **** end #def initialize ! attr_reader :list_ids ! attr_reader :entry_id ! def add_defline(str) case str --- 481,485 ---- end #def initialize ! # Parses given string and adds parsed data. def add_defline(str) case str *************** *** 344,347 **** --- 612,619 ---- private :parse_NSIDs + + # Shows original string. + # Note that the result of this method may be different from + # original string which is given in FastaDefline.new method. def to_s @deflines.collect { |a| *************** *** 351,358 **** --- 623,632 ---- end + # Shows description. def description @deflines[0].to_a[-1] end + # Returns descriptions. def descriptions @deflines.collect do |a| *************** *** 361,364 **** --- 635,640 ---- end + # Shows ID-like strings. + # Returns an array of strings. def id_strings r = [] *************** *** 402,405 **** --- 678,682 ---- ] + # Shows words used in the defline. Returns an Array. def words(case_sensitive = nil, kill_regexp = self.class::KillRegexpArray, kwhash = self.class::KillWordsHash) *************** *** 427,432 **** end ! def get(db) ! db =db.to_s r = nil unless r = @info[db] then --- 704,710 ---- end ! # Returns identifires by a database name. ! def get(dbname) ! db = dbname.to_s r = nil unless r = @info[db] then *************** *** 450,457 **** end ! def get_by_type(tstr) @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! if i = labels.index(tstr) then return x[i+1] end --- 728,736 ---- end ! # Returns an identifier by given type. ! def get_by_type(type_str) @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! if i = labels.index(type_str) then return x[i+1] end *************** *** 461,469 **** end ! def get_all_by_type(*tstrarg) d = [] @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! tstrarg.each do |y| if i = labels.index(y) then d << x[i+1] if x[i+1] --- 740,749 ---- end ! # Returns identifiers by given type. ! def get_all_by_type(*type_strarg) d = [] @list_ids.each do |x| if labels = self.class::NSIDs[x[0]] then ! type_strarg.each do |y| if i = labels.index(y) then d << x[i+1] if x[i+1] *************** *** 475,478 **** --- 755,762 ---- end + # Shows locus. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def locus unless defined?(@locus) *************** *** 482,485 **** --- 766,773 ---- end + # Shows GI. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def gi unless defined?(@gi) then *************** *** 489,492 **** --- 777,784 ---- end + # Shows accession with version number. + # If the entry has more than two of such IDs, + # only the first ID are shown. + # Returns a string or nil. def acc_version unless defined?(@acc_version) then *************** *** 496,499 **** --- 788,793 ---- end + # Shows accession numbers. + # Returns an array of strings. def accessions unless defined?(@accessions) then *************** *** 504,507 **** --- 798,802 ---- end + # Shows an accession number. def accession unless defined?(@accession) then *************** *** 524,527 **** --- 819,823 ---- r end + end #class FastaDefline *************** *** 610,869 **** end - - =begin - - = Bio::FastaFormat - - Treats a FASTA formatted entry, such as: - - >id and/or some comments <== comment line - ATGCATGCATGCATGCATGCATGCATGCATGCATGC <== sequence lines - ATGCATGCATGCATGCATGCATGCATGCATGCATGC - ATGCATGCATGC - - The precedent '>' can be omitted and the trailing '>' will be removed - automatically. - - --- Bio::FastaFormat.new(entry) - - Stores the comment and sequence information from one entry of the - FASTA format string. If the argument contains more than one - entry, only the first entry is used. - - --- Bio::FastaFormat#entry - - Returns the stored one entry as a FASTA format. (same as to_s) - - --- Bio::FastaFormat#definition - - Returns the comment line of the FASTA formatted data. - - --- Bio::FastaFormat#seq - - Returns a joined sequence line as a String. - - --- Bio::FastaFormat#query(factory) - --- Bio::FastaFormat#fasta(factory) - --- Bio::FastaFormat#blast(factory) - - Executes FASTA/BLAST search by using a Bio::Fasta or a Bio::Blast - factory object. - - #!/usr/bin/env ruby - - require 'bio' - - factory = Bio::Fasta.local('fasta34', 'db/swissprot.f') - flatfile = Bio::FlatFile.open(Bio::FastaFormat, 'queries.f') - flatfile.each do |entry| - p entry.definition - result = entry.fasta(factory) - result.each do |hit| - print "#{hit.query_id} : #{hit.evalue}\t#{hit.target_id} at " - p hit.lap_at - end - end - - --- Bio::FastaFormat#length - - Returns sequence length. - - --- Bio::FastaFormat#naseq - --- Bio::FastaFormat#nalen - --- Bio::FastaFormat#aaseq - --- Bio::FastaFormat#aalen - - If you know whether the sequence is NA or AA, use these methods. - 'naseq' and 'aaseq' methods returen the Bio::Sequence::NA or - Bio::Sequence::AA object respectively. 'nalen' and 'aalen' methods - return the length of them. - - --- Bio::FastaFormat#identifiers - - Parsing FASTA Defline, and extract IDs. - IDs are NSIDs (NCBI standard FASTA sequence identifiers) - or ":"-separated IDs. - It returns a Bio::FastaDefline instance. - - --- Bio::FastaFormat#entry_id - - Parsing FASTA Defline (using #identifiers method), and - shows a possibly unique identifier. - It returns a string. - - --- Bio::FastaFormat#gi - --- Bio::FastaFormat#locus - --- Bio::FastaFormat#accession - --- Bio::FastaFormat#acc_version - - Parsing FASTA Defline (using #identifiers method), and - shows GI/locus/accession/accession with version number. - If a entry has more than two of such IDs, - only the first ID are shown. - It returns a string or nil. - - --- Bio::FastaFormat#accessions - - Parsing FASTA Defline (using #identifiers method), and - shows accession numbers. - It returns an array of strings. - - --- Bio::FastaFormat - - = Bio::FastaNumericFormat - - Treats a FASTA formatted numerical entry, such as: - - >id and/or some comments <== comment line - 24 15 23 29 20 13 20 21 21 23 22 25 13 <== numerical data - 22 17 15 25 27 32 26 32 29 29 25 - - The precedent '>' can be omitted and the trailing '>' will be removed - automatically. - - --- Bio::FastaNumericFormat.new(entry) - - Stores the comment and the list of the numerical data. - - --- Bio::FastaNumericFormat#definition - - The comment line of the FASTA formatted data. - - --- Bio::FastaNumericFormat#data - - Returns the list of the numerical data (typically the quality score - of its corresponding sequence) as an Array. - - --- Bio::FastaNumericFormat#length - - Returns the number of elements in the numerical data. - - --- Bio::FastaNumericFormat#each - - Yields on each elements of the numerical data. - - --- Bio::FastaNumericFormat#[](n) - - Returns the n-th element. - - --- Bio::FastaNumericFormat#identifiers - --- Bio::FastaNumericFormat#entry_id - --- Bio::FastaNumericFormat#gi - --- Bio::FastaNumericFormat#locus - --- Bio::FastaNumericFormat#accession - --- Bio::FastaNumericFormat#acc_version - --- Bio::FastaNumericFormat#accessions - - Same as Bio::FastaFormat. - - - = Bio::FastaDefline - - Parsing FASTA Defline, and extract IDs and other informations. - IDs are NSIDs (NCBI standard FASTA sequence identifiers) - or ":"-separated IDs. - - --- see also: - ftp://ftp.ncbi.nih.gov/blast/documents/README.formatdb - http://blast.wustl.edu/doc/FAQ-Indexing.html#Identifiers - - --- Bio::FastaDefline.new(str) - - Parses given string. - - --- Bio::FastaFormat#entry_id - - Shows a possibly unique identifier. - Returns a string. - - --- Bio::FastaDefline#gi - --- Bio::FastaDefline#locus - --- Bio::FastaDefline#accession - --- Bio::FastaDefline#acc_version - - Shows GI/locus/accession/accession with version number. - If the entry has more than two of such IDs, - only the first ID are shown. - Returns a string or nil. - - --- Bio::FastaFormat#accessions - - Shows accession numbers. - Returns an array of strings. - - --- Bio::FastaDefline#add_defline(str) - - Parses given string and adds parsed data. - - --- Bio::FastaDefline#to_s - - Shows original string. - Note that the result of this method may be different from - original string which is given in FastaDefline.new method. - - --- Bio::FastaDefline#id_strings - - Shows ID-like strings. - Returns an array of strings. - - --- Bio::FastaDefline#list_ids - - Shows array that contains IDs (or ID-like strings). - Returns an array of arrays of strings. - - --- Bio::FastaDefline#description - --- Bio::FastaDefline#descriptions - - --- Bio::FastaDefline#words(case_sensitive = nil, - kill_words_regexp_array, kill_words_hash) - - --- Bio::FastaDefline#get(tag_of_id) - - --- Bio::FastaDefline#get_by_type(type_of_id) - - --- Bio::FastaDefline#get_all_by_type(type_of_id) - - --- examples: - rub = Bio::FastaDefline.new('>gi|671595|emb|CAA85678.1| rubisco large subunit [Perovskia abrotanoides]') - rub.entry_id ==> 'gi|671595' - rub.get('emb') ==> 'CAA85678.1' - rub.emb ==> 'CAA85678.1' - rub.gi ==> '671595' - rub.accession ==> 'CAA85678' - rub.accessions ==> [ 'CAA85678' ] - rub.acc_version ==> 'CAA85678.1' - rub.locus ==> nil - rub.list_ids ==> [["gi", "671595"], - ["emb", "CAA85678.1", nil], - ["Perovskia abrotanoides"]] - - ckr = Bio::FastaDefline.new(">gi|2495000|sp|Q63931|CCKR_CAVPO CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)\001gi|2147182|pir||I51898 cholecystokinin A receptor - guinea pig\001gi|544724|gb|AAB29504.1| cholecystokinin A receptor; CCK-A receptor [Cavia]") - ckr.entry_id ==> "gi|2495000" - ckr.sp ==> "CCKR_CAVPO" - ckr.pir ==> "I51898" - ckr.gb ==> "AAB29504.1" - ckr.gi ==> "2495000" - ckr.accession ==> "AAB29504" - ckr.accessions ==> ["Q63931", "AAB29504"] - ckr.acc_version ==> "AAB29504.1" - ckr.locus ==> nil - ckr.description ==> - "CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)" - ckr.descriptions ==> - ["CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A RECEPTOR) (CCK-AR)", - "cholecystokinin A receptor - guinea pig", - "cholecystokinin A receptor; CCK-A receptor [Cavia]"] - ckr.words ==> - ["cavia", "cck-a", "cck-ar", "cholecystokinin", "guinea", "pig", - "receptor", "type"] - ckr.id_strings ==> - ["2495000", "Q63931", "CCKR_CAVPO", "2147182", "I51898", - "544724", "AAB29504.1", "Cavia"] - ckr.list_ids ==> - [["gi", "2495000"], ["sp", "Q63931", "CCKR_CAVPO"], - ["gi", "2147182"], ["pir", nil, "I51898"], ["gi", "544724"], - ["gb", "AAB29504.1", nil], ["Cavia"]] - - =end - --- 906,908 ---- From ngoto at pub.open-bio.org Sun Jan 29 06:48:41 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:48:41 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb pdb.rb,1.13,1.14 Message-ID: <200601290648.k0T6mfVL007883@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv7873 Modified Files: pdb.rb Log Message: changed "str" to "str.to_s" to improve tolerance to wrong or incomplete data Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb/pdb.rb,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** pdb.rb 20 Jan 2006 13:54:08 -0000 1.13 --- pdb.rb 29 Jan 2006 06:48:39 -0000 1.14 *************** *** 87,91 **** module Pdb_SList def self.new(str) ! str.strip.split(/\;\s*/) end end --- 87,91 ---- module Pdb_SList def self.new(str) ! str.to_s.strip.split(/\;\s*/) end end *************** *** 93,97 **** module Pdb_List def self.new(str) ! str.strip.split(/\,\s*/) end end --- 93,97 ---- module Pdb_List def self.new(str) ! str.to_s.strip.split(/\,\s*/) end end *************** *** 99,103 **** module Pdb_Specification_list def self.new(str) ! a = str.strip.split(/\;\s*/) a.collect! { |x| x.split(/\:\s*/, 2) } a --- 99,103 ---- module Pdb_Specification_list def self.new(str) ! a = str.to_s.strip.split(/\;\s*/) a.collect! { |x| x.split(/\:\s*/, 2) } a *************** *** 107,111 **** module Pdb_String def self.new(str) ! str.gsub(/\s+\z/, '') end --- 107,111 ---- module Pdb_String def self.new(str) ! str.to_s.gsub(/\s+\z/, '') end *************** *** 117,121 **** @@nn = nn def self.new(str) ! str.gsub(/\s+\z/, '').ljust(@@nn)[0, @@nn] end } --- 117,121 ---- @@nn = nn def self.new(str) ! str.to_s.gsub(/\s+\z/, '').ljust(@@nn)[0, @@nn] end } *************** *** 130,134 **** @@nn = nn def self.new(str) ! str.ljust(@@nn)[0, @@nn] end } --- 130,134 ---- @@nn = nn def self.new(str) ! str.to_s.ljust(@@nn)[0, @@nn] end } *************** *** 158,162 **** module Pdb_StringRJ def self.new(str) ! str.gsub(/\A\s+/, '') end end --- 158,162 ---- module Pdb_StringRJ def self.new(str) ! str.to_s.gsub(/\A\s+/, '') end end From ngoto at pub.open-bio.org Sun Jan 29 06:54:15 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:15 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db pdb.rb,1.5,1.6 Message-ID: <200601290654.k0T6sFVL007957@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db In directory pub.open-bio.org:/tmp/cvs-serv7939/db Modified Files: pdb.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: pdb.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/db/pdb.rb,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** pdb.rb 16 Dec 2005 19:23:03 -0000 1.5 --- pdb.rb 29 Jan 2006 06:54:13 -0000 1.6 *************** *** 26,29 **** --- 26,32 ---- module Bio class PDB #< DB + + autoload :ChemicalComponent, 'bio/db/pdb/chemicalcomponent' + end #class PDB end #module Bio From ngoto at pub.open-bio.org Sun Jan 29 06:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io flatfile.rb,1.42,1.43 Message-ID: <200601290654.k0T6sGVL007965@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io In directory pub.open-bio.org:/tmp/cvs-serv7939/io Modified Files: flatfile.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: flatfile.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile.rb,v retrieving revision 1.42 retrieving revision 1.43 diff -C2 -d -r1.42 -r1.43 *** flatfile.rb 28 Jan 2006 04:23:41 -0000 1.42 --- flatfile.rb 29 Jan 2006 06:54:14 -0000 1.43 *************** *** 443,446 **** --- 443,449 ---- Bio::PDB + when /^RESIDUE +.+ +\d+\s*$/ + Bio::PDB::ChemicalComponent + when /^CLUSTAL .*\(.*\).*sequence +alignment/ Bio::ClustalW::Report From ngoto at pub.open-bio.org Sun Jan 29 06:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/db/pdb chemicalcomponent.rb, NONE, 1.1 Message-ID: <200601290654.k0T6sGVL007961@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/db/pdb In directory pub.open-bio.org:/tmp/cvs-serv7939/db/pdb Added Files: chemicalcomponent.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. --- NEW FILE: chemicalcomponent.rb --- # # = bio/db/pdb/chemicalcomponent.rb - PDB Chemical Component Dictionary parser # # Copyright:: Copyright (C) 2006 # GOTO Naohisa # License:: LGPL # # $Id: chemicalcomponent.rb,v 1.1 2006/01/29 06:54:13 ngoto Exp $ # #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #++ # # = About Bio::PDB::ChemicalComponent # # Please refer Bio::PDB::ChemicalComponent. # # = References # # * (()) # * http://deposit.pdb.org/het_dictionary.txt # require 'bio/db/pdb/pdb' module Bio class PDB # Bio::PDB::ChemicalComponet is a parser for a entry of # the PDB Chemical Component Dictionary. # # The PDB Chemical Component Dictionary is available in # http://deposit.pdb.org/het_dictionary.txt class ChemicalComponent # delimiter for reading via Bio::FlatFile DELIMITER = RS = "\n\n" # Single field (normally single line) of a entry class Record < Bio::PDB::Record # fetches record name def fetch_record_name(str) str[0..6].strip end private :fetch_record_name # fetches record name def self.fetch_record_name(str) str[0..6].strip end private_class_method :fetch_record_name # RESIDUE field. # It would be wrong because the definition described in documents # seems ambiguous. RESIDUE = def_rec([ 11, 13, Pdb_LString[3], :hetID ], [ 16, 20, Pdb_Integer, :numHetAtoms ] ) # CONECT field # It would be wrong because the definition described in documents # seems ambiguous. CONECT = def_rec([ 12, 15, Pdb_Atom, :name ], [ 19, 20, Pdb_Integer, :num ], [ 21, 24, Pdb_Atom, :other_atoms ], [ 26, 29, Pdb_Atom, :other_atoms ], [ 31, 34, Pdb_Atom, :other_atoms ], [ 36, 39, Pdb_Atom, :other_atoms ], [ 41, 44, Pdb_Atom, :other_atoms ], [ 46, 49, Pdb_Atom, :other_atoms ], [ 51, 54, Pdb_Atom, :other_atoms ], [ 56, 59, Pdb_Atom, :other_atoms ], [ 61, 64, Pdb_Atom, :other_atoms ], [ 66, 69, Pdb_Atom, :other_atoms ], [ 71, 74, Pdb_Atom, :other_atoms ], [ 76, 79, Pdb_Atom, :other_atoms ] ) # HET field. # It is the same as Bio::PDB::Record::HET. HET = Bio::PDB::Record::HET #-- #HETSYN = Bio::PDB::Record::HETSYN #++ # HETSYN field. # It is very similar to Bio::PDB::Record::HETSYN. HETSYN = def_rec([ 9, 10, Pdb_Continuation, nil ], [ 12, 14, Pdb_LString(3), :hetID ], [ 16, 70, Pdb_String, :hetSynonyms ] ) # HETNAM field. # It is the same as Bio::PDB::Record::HETNAM. HETNAM = Bio::PDB::Record::HETNAM # FORMUL field. # It is the same as Bio::PDB::Record::FORMUL. FORMUL = Bio::PDB::Record::FORMUL # default definition for unknown fields. Default = Bio::PDB::Record::Default # Hash to store allowed definitions. Definition = create_definition_hash # END record class. # # Because END is a reserved word of Ruby, it is separately # added to the hash End = Bio::PDB::Record::End Definition['END'] = End # Look up the class in Definition hash def self.get_record_class(str) t = fetch_record_name(str) return Definition[t] end end #class Record # Creates a new object. def initialize(str) @data = str.split(/[\r\n]+/) @hash = {} #Flag to say whether the current line is part of a continuation cont = false #Goes through each line and replace that line with a PDB::Record @data.collect! do |line| #Go to next if the previous line was contiunation able, and #add_continuation returns true. Line is added by add_continuation next if cont and cont = cont.add_continuation(line) #Make the new record f = Record.get_record_class(line).new.initialize_from_string(line) #p f #Set cont cont = f if f.continue? #Set the hash to point to this record either by adding to an #array, or on it's own key = f.record_name if a = @hash[key] then a << f else @hash[key] = [ f ] end f end #each #At the end we need to add the final model @data.compact! end # all records in this entry as an array. attr_reader :data # all records in this entry as an hash accessed by record names. attr_reader :hash # Identifier written in the first line "RESIDUE" record. (e.g. CMP) def entry_id @data[0].hetID end # Synonyms for the comical component. Returns an array of strings. def hetsyn unless defined? @hetsyn if r = @hash["HETSYN"] @hetsyn = r[0].hetSynonyms.to_s.split(/\;\s*/) else return [] end end @hetsyn end # The name of the chemical component. # Returns a string (or nil, if the entry is something wrong). def hetnam @hash["HETNAM"][0].text end # The chemical formula of the chemical component. # Returns a string (or nil, if the entry is something wrong). def formul @hash["FORMUL"][0].text end # Returns an hash of bindings of atoms. # Note that each white spaces are stripped for atom symbols. def conect unless defined? @conect c = {} @hash["CONECT"].each do |e| key = e.name.to_s.strip unless key.empty? val = e.other_atoms.collect { |x| x.strip } #warn "Warning: #{key}: atom name conflict?" if c[key] c[key] = val end end @conect = c end @conect end # Gets all records whose record type is _name_. # Returns an array of Bio::PDB::Record::* objects. # # if _name_ is nil, returns hash storing all record data. # # Example: # p pdb.record('CONECT') # p pdb.record['CONECT'] # def record(name = nil) name ? @hash[name] : @hash end end #class ChemicalComponent end #class PDB end #module Bio From ngoto at pub.open-bio.org Sun Jan 29 06:54:16 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 06:54:16 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io/flatfile indexer.rb,1.21,1.22 Message-ID: <200601290654.k0T6sGVL007967@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io/flatfile In directory pub.open-bio.org:/tmp/cvs-serv7939/io/flatfile Modified Files: indexer.rb Log Message: * In lib/bio/db/pdb/chemicalcomponent.rb, added a new class Bio::PDB::ChemicalComponent to parse the PDB Chemical Component Dictionary (PDB style format). * Added file format autodetection for Bio::PDB::ChemicalComponent. * Added flatfile indexer for Bio::PDB::ChemicalComponent. Index: indexer.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/indexer.rb,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** indexer.rb 26 Sep 2005 13:00:08 -0000 1.21 --- indexer.rb 29 Jan 2006 06:54:14 -0000 1.22 *************** *** 81,84 **** --- 81,86 ---- when 'Bio::Blast::WU::Report_TBlast' BlastDefaultParser.new(Bio::Blast::WU::Report_TBlast, *arg) + when 'Bio::PDB::ChemicalComponent' + PDBChemicalComponentParser.new(Bio::PDB::ChemicalComponent, *arg) else raise 'unknown or unsupported format' *************** *** 437,440 **** --- 439,471 ---- end end #class BlastDefaultReportParser + + class PDBChemicalComponentParser < TemplateParser + NAMESTYLE = NameSpaces.new( + NameSpace.new( 'UNIQUE', Proc.new { |x| x.entry_id } ) + ) + PRIMARY = 'UNIQUE' + def initialize(klass, pri_name = nil, sec_names = nil) + super() + self.format = 'raw' + self.dbclass = Bio::PDB::ChemicalComponent + self.set_primary_namespace((pri_name or PRIMARY)) + unless sec_names then + sec_names = [] + @namestyle.each_value do |x| + sec_names << x.name if x.name != self.primary.name + end + end + self.add_secondary_namespaces(*sec_names) + end + def open_flatfile(fileid, file) + super + @flatfile.pos = 0 + begin + pos = @flatfile.pos + line = @flatfile.gets + end until (!line or line =~ /^RESIDUE /) + @flatfile.pos = pos + end + end #class PDBChemicalComponentParser end #module Parser From nakao at pub.open-bio.org Sun Jan 29 07:39:34 2006 From: nakao at pub.open-bio.org (Mitsuteru C. Nakao) Date: Sun, 29 Jan 2006 07:39:34 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.18,1.19 Message-ID: <200601290739.k0T7dYVL008081@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio In directory pub.open-bio.org:/tmp/cvs-serv8071/lib/bio Modified Files: reference.rb Log Message: * Added RDoc. Index: reference.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/reference.rb,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** reference.rb 18 Dec 2005 16:58:58 -0000 1.18 --- reference.rb 29 Jan 2006 07:39:31 -0000 1.19 *************** *** 1,6 **** # ! # bio/reference.rb - journal reference class # ! # Copyright (C) 2001 KATAYAMA Toshiaki # # This library is free software; you can redistribute it and/or --- 1,22 ---- # ! # = bio/reference.rb - Journal reference classes # ! # Copyright:: Copyright (C) 2001 ! # KATAYAMA Toshiaki ! # Lisence:: LGPL ! # ! # $Id$ ! # ! # == Description ! # ! # Journal reference classes. ! # ! # == Examples ! # ! # == References ! # ! # ! # ! #-- # # This library is free software; you can redistribute it and/or *************** *** 18,28 **** # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ # module Bio class Reference def initialize(hash) hash.default = '' --- 34,100 ---- # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! #++ # module Bio + # A class for journal reference information. + # + # === Examples + # + # hash = {'authors' => [ "Hoge, J.P.", "Fuga, F.B." ], 'title' => "Title of the study.", + # 'journal' => "Theor. J. Hoge", 'volume' => 12, 'issue' => 3, 'pages' => "123-145", + # 'year' => 2001, 'pubmed' => 12345678, 'medline' => 98765432, 'abstract' => "...", + # ''url' => "http://...", 'mesh' => [], 'affiliations' => []} + # ref = Bio::Reference.new(hash) + # + # # Formats in the BiBTeX style. + # ref.format("bibtex") + # + # # Short-cut for Bio::Reference#format("bibtex") + # ref.bibtex + # class Reference + # Author names in an Array, [ "Hoge, J.P.", "Fuga, F.B." ]. + attr_reader :authors + + # "Title of the study." + attr_reader :title + + # "Theor. J. Hoge" + attr_reader :journal + + # 12 + attr_reader :volume + + # 3 + attr_reader :issue + + # "123-145" + attr_reader :pages + + # 2001 + attr_reader :year + + # 12345678 + attr_reader :pubmed + + # 98765432 + attr_reader :medline + + # Abstract test in String. + attr_reader :abstract + + # A URL String. + attr_reader :url + + # MeSH terms in an Array. + attr_reader :mesh + + # Affiliations in an Array. + attr_reader :affiliations + + # def initialize(hash) hash.default = '' *************** *** 44,50 **** @affiliations = [] if @affiliations.empty? end - attr_reader :authors, :title, :journal, :volume, :issue, :pages, :year, - :pubmed, :medline, :abstract, :url, :mesh, :affiliations def format(style = nil, option = nil) case style --- 116,136 ---- @affiliations = [] if @affiliations.empty? end + # Formats the reference in a given style. + # + # Styles: + # 0. nil - general + # 1. endnote - Endnote + # 2. bibitem - Bibitem (option acceptable) + # 3. bibtex - BiBTeX (option acceptable) + # 4. rd - rd (option acceptable) + # 5. nature - Nature (option acceptable) + # 6. science - Science + # 7. genome_biol - Genome Biology + # 8. genome_res - Genome Research + # 9. nar - Nucleic Acids Research + # 10. current - Current Biology + # 11. trends - Trends in * + # 12. cell - Cell Press def format(style = nil, option = nil) case style *************** *** 78,81 **** --- 164,168 ---- end + # Formats in the Endonote style. def endnote lines = [] *************** *** 105,108 **** --- 192,196 ---- end + # Formats in the bibitem. def bibitem(item = nil) item = "PMID:#{@pubmed}" unless item *************** *** 116,119 **** --- 204,208 ---- end + # Formats in the BiBTeX style. def bibtex(section = nil) section = "article" unless section *************** *** 133,136 **** --- 222,226 ---- end + # Formats in a general style. def general authors = @authors.join(', ') *************** *** 138,141 **** --- 228,232 ---- end + # Formats in the RD style. def rd(str = nil) @abstract ||= str *************** *** 148,151 **** --- 239,244 ---- end + # Formats in the Nature Publish Group style. + # * http://www.nature.com def nature(short = false) if short *************** *** 164,167 **** --- 257,262 ---- end + # Formats in the Science style. + # * http://www.siencemag.com/ def science if @authors.size > 4 *************** *** 174,177 **** --- 269,274 ---- end + # Formats in the Genome Biology style. + # * http://genomebiology.com/ def genome_biol authors = @authors.collect {|name| strip_dots(name)}.join(', ') *************** *** 179,184 **** --- 276,285 ---- "#{authors}: #{@title} #{journal} #{@year}, #{@volume}:#{@pages}." end + # Formats in the Current Biology style. + # * http://www.current-biology.com/ alias current genome_biol + # Formats in the Genome Research style. + # * http://genome.org/ def genome_res authors = authors_join(' and ') *************** *** 186,189 **** --- 287,292 ---- end + # Formats in the Nucleic Acids Reseach style. + # * http://nar.oxfordjournals.org/ def nar authors = authors_join(' and ') *************** *** 191,199 **** end def cell authors = authors_join(' and ') "#{authors} (#{@year}). #{@title} #{@journal} #{@volume}, #{pages}." end ! def trends if @authors.size > 2 --- 294,306 ---- end + # Formats in the CELL Press style. + # http://www.cell.com/ def cell authors = authors_join(' and ') "#{authors} (#{@year}). #{@title} #{@journal} #{@volume}, #{pages}." end ! ! # Formats in the TRENDS Journals. ! # * http://www.trends.com/ def trends if @authors.size > 2 *************** *** 236,255 **** end ! class References def initialize(ary = []) @references = ary end - attr_accessor :references ! def append(a) ! @references.push(a) if a.is_a? Reference return self end def each ! @references.each do |x| ! yield x end end --- 343,377 ---- end ! # Set of Bio::Reference. ! # ! # === Examples ! # ! # refs = Bio::References.new ! # refs.append(Bio::Reference.new(hash)) ! # refs.each do |reference| ! # ... ! # end ! # class References + # Array of Bio::Reference. + attr_accessor :references + + # def initialize(ary = []) @references = ary end ! ! # Append a Bio::Reference object. ! def append(reference) ! @references.push(reference) if a.is_a? Reference return self end + # Iterates each Bio::Reference object. def each ! @references.each do |reference| ! yield reference end end *************** *** 258,308 **** end - - - - =begin - - = Bio::Reference - - --- Bio::Reference.new(hash) - - --- Bio::Reference#authors -> Array - --- Bio::Reference#title -> String - --- Bio::Reference#journal -> String - --- Bio::Reference#volume -> Fixnum - --- Bio::Reference#issue -> Fixnum - --- Bio::Reference#pages -> String - --- Bio::Reference#year -> Fixnum - --- Bio::Reference#pubmed -> Fixnum - --- Bio::Reference#medline -> Fixnum - --- Bio::Reference#abstract -> String - --- Bio::Reference#url -> String - --- Bio::Reference#mesh -> Array - --- Bio::Reference#affiliations -> Array - - --- Bio::Reference#format(style = nil, option = nil) -> String - - --- Bio::Reference#endnote - --- Bio::Reference#bibitem(item = nil) -> String - --- Bio::Reference#bibtex(section = nil) -> String - --- Bio::Reference#rd(str = nil) -> String - --- Bio::Reference#nature(short = false) -> String - --- Bio::Reference#science -> String - --- Bio::Reference#genome_biol -> String - --- Bio::Reference#genome_res -> String - --- Bio::Reference#nar -> String - --- Bio::Reference#cell -> String - --- Bio::Reference#trends -> String - --- Bio::Reference#general -> String - - = Bio::References - - --- Bio::References.new(ary = []) - - --- Bio::References#references -> Array - --- Bio::References#append(a) -> Bio::References - --- Bio::References#each -> Array - - =end - --- 380,382 ---- From ngoto at pub.open-bio.org Sun Jan 29 10:06:45 2006 From: ngoto at pub.open-bio.org (Naohisa Goto) Date: Sun, 29 Jan 2006 10:06:45 +0000 Subject: [BioRuby-cvs] bioruby/lib/bio/io/flatfile index.rb,1.15,1.16 Message-ID: <200601291006.k0TA6jVL017433@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/lib/bio/io/flatfile In directory pub.open-bio.org:/tmp/cvs-serv17423 Modified Files: index.rb Log Message: added RDoc (still incomplete) Index: index.rb =================================================================== RCS file: /home/repository/bioruby/bioruby/lib/bio/io/flatfile/index.rb,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** index.rb 28 Nov 2005 05:08:26 -0000 1.15 --- index.rb 29 Jan 2006 10:06:43 -0000 1.16 *************** *** 1,7 **** # ! # bio/io/flatfile/index.rb - OBDA flatfile index ! # ! # Copyright (C) 2002 GOTO Naohisa # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public --- 1,12 ---- # ! # = bio/io/flatfile/index.rb - OBDA flatfile index # + # Copyright:: Copyright (C) 2002 + # GOTO Naohisa + # License:: LGPL + # + # $Id$ + # + #-- # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public *************** *** 17,27 **** # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # ! # $Id$ ! # require 'bio/io/flatfile/indexer' module Bio class FlatFileIndex --- 22,83 ---- # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + #++ # ! # = About Bio::FlatFileIndex ! # ! # Please refer documents of following classes. ! # Classes/modules marked '#' are internal use only. ! # ! # == Classes/modules in index.rb ! # * class Bio::FlatFileIndex ! # * class Bio::FlatFileIndex::Results ! # * module Bio::FlatFileIndex::DEBUG ! # * #module Bio::FlatFileIndex::Template ! # * #class Bio::FlatFileIndex::Template::NameSpace ! # * #class Bio::FlatFileIndex::FileID ! # * #class Bio::FlatFileIndex::FileIDs ! # * #module Bio::FlatFileIndex::Flat_1 ! # * #class Bio::FlatFileIndex::Flat_1::Record ! # * #class Bio::FlatFileIndex::Flat_1::FlatMappingFile ! # * #class Bio::FlatFileIndex::Flat_1::PrimaryNameSpace ! # * #class Bio::FlatFileIndex::Flat_1::SecondaryNameSpace ! # * #class Bio::FlatFileIndex::NameSpaces ! # * #class Bio::FlatFileIndex::DataBank ! # ! # == Classes/modules in indexer.rb ! # * module Bio::FlatFileIndex::Indexer ! # * #class Bio::FlatFileIndex::Indexer::NameSpace ! # * #class Bio::FlatFileIndex::Indexer::NameSpaces ! # * #module Bio::FlatFileIndex::Indexer::Parser ! # * #class Bio::FlatFileIndex::Indexer::Parser::TemplateParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::GenBankParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::GenPeptParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::EMBLParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::SPTRParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::FastaFormatParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLSequenceParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::MaXMLClusterParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::BlastDefaultParser ! # * #class Bio::FlatFileIndex::Indexer::Parser::PDBChemicalComponentParser ! # ! # == Classes/modules in bdb.rb ! # * #module Bio::FlatFileIndex::BDBDefault ! # * #class Bio::FlatFileIndex::BDBWrapper ! # * #module Bio::FlatFileIndex::BDB_1 ! # * #class Bio::FlatFileIndex::BDB_1::BDBMappingFile ! # * #class Bio::FlatFileIndex::BDB_1::PrimaryNameSpace ! # * #class Bio::FlatFileIndex::BDB_1::SecondaryNameSpace ! # ! # = References ! # * (()) ! # * (()) ! # require 'bio/io/flatfile/indexer' module Bio + + + # Bio::FlatFileIndex is a class for OBDA flatfile index. class FlatFileIndex *************** *** 31,38 **** --- 87,105 ---- autoload :BDB_1, 'bio/io/flatfile/bdb' + # magic string for flat/1 index MAGIC_FLAT = 'flat/1' + + # magic string for BerkeleyDB/1 index MAGIC_BDB = 'BerkeleyDB/1' ######################################################### + + # Opens existing databank. Databank is a directory which contains + # indexed files and configuration files. The type of the databank + # (flat or BerkeleyDB) are determined automatically. + # + # If block is given, the databank object is passed to the block. + # The databank will be automatically closed when the block terminates. + # def self.open(name) if block_given? then *************** *** 54,57 **** --- 121,130 ---- end + # Opens existing databank. Databank is a directory which contains + # indexed files and configuration files. The type of the databank + # (flat or BerkeleyDB) are determined automatically. + # + # Unlike +FlatFileIndex.open+, block is not allowed. + # def initialize(name) @db = DataBank.open(name) *************** *** 59,67 **** --- 132,149 ---- # common interface defined in registry.rb + # Searching databank and returns entry (or entries) as a string. + # Multiple entries (contatinated to one string) may be returned. + # Returns empty string if not found. + # def get_by_id(key) search(key).to_s end + #-- # original methods + #++ + + # Closes the databank. + # Returns nil. def close check_closed? *************** *** 70,73 **** --- 152,156 ---- end + # Returns true if already closed. Otherwise, returns false. def closed? if @db then *************** *** 78,81 **** --- 161,177 ---- end + # Set default namespaces. + # default_namespaces = nil + # means all namespaces in the databank. + # + # default_namespaces= [ str1, str2, ... ] + # means set default namespeces to str1, str2, ... + # + # Default namespaces specified in this method only affect + # #get_by_id, #search, and #include? methods. + # + # Default of default namespaces is nil (that is, all namespaces + # are search destinations by default). + # def default_namespaces=(names) if names then *************** *** 87,94 **** --- 183,194 ---- end + # Returns default namespaces. + # Returns an array of strings or nil. + # nil means all namespaces. def default_namespaces @names end + # Searching databank and returns a Bio::FlatFileIndex::Results object. def search(key) check_closed? *************** *** 100,103 **** --- 200,206 ---- end + # Searching only specified namespeces. + # Returns a Bio::FlatFileIndex::Results object. + # def search_namespaces(key, *names) check_closed? *************** *** 105,108 **** --- 208,214 ---- end + # Searching only primary namespece. + # Returns a Bio::FlatFileIndex::Results object. + # def search_primary(key) check_closed? *************** *** 110,113 **** --- 216,227 ---- end + # Searching databank. + # If some entries are found, returns an array of + # unique IDs (primary identifiers). + # If not found anything, returns nil. + # + # This method is useful when search result is very large and + # #search method is very slow. + # def include?(key) check_closed? *************** *** 124,127 **** --- 238,243 ---- end + # Same as #include?, but serching only specified namespaces. + # def include_in_namespaces?(key, *names) check_closed? *************** *** 134,137 **** --- 250,255 ---- end + # Same as #include?, but serching only primary namespace. + # def include_in_primary?(key) check_closed? *************** *** 144,147 **** --- 262,268 ---- end + # Returns names of namespaces defined in the databank. + # (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] ) + # def namespaces check_closed? *************** *** 151,154 **** --- 272,276 ---- end + # Returns name of primary namespace as a string. def primary_namespace check_closed? *************** *** 156,159 **** --- 278,282 ---- end + # Returns names of secondary namespaces as an array of strings. def secondary_namespaces check_closed? *************** *** 161,164 **** --- 284,295 ---- end + # Check consistency between the databank(index) and original flat files. + # + # If the original flat files are changed after creating + # the databank, raises RuntimeError. + # + # Note that this check only compares file sizes as + # described in the OBDA specification. + # def check_consistency check_closed? *************** *** 166,177 **** --- 297,323 ---- end + # If true is given, consistency checks will be performed every time + # accessing flatfiles. If nil/false, no checks are performed. + # + # By default, always_check_consistency is true. + # def always_check_consistency=(bool) @db.always_check=(bool) end + + # If true, consistency checks will be performed every time + # accessing flatfiles. If nil/false, no checks are performed. + # + # By default, always_check_consistency is true. + # def always_check_consistency(bool) @db.always_check end + #-- # private methods + #++ + + # If the databank is closed, raises IOError. def check_closed? @db or raise IOError, 'closed databank' *************** *** 179,186 **** --- 325,351 ---- private :check_closed? + #-- ######################################################### + #++ + # Results stores search results created by + # Bio::FlatFileIndex methods. + # + # Currently, this class inherits Hash, but internal + # structure of this class may be changed anytime. + # Only using methods described below are strongly recomended. + # class Results < Hash + # Add search results. + # "a + b" means "a OR b". + # * Example + # # I want to search 'ADH_IRON_1' OR 'ADH_IRON_2' + # db = Bio::FlatFIleIndex.new(location) + # a1 = db.search('ADH_IRON_1') + # a2 = db.search('ADH_IRON_2') + # # a1 and a2 are Bio::FlatFileIndex::Results objects. + # print a1 + a2 + # def +(a) raise 'argument must be Results class' unless a.is_a?(self.class) *************** *** 190,193 **** --- 355,368 ---- end + # Returns set intersection of results. + # "a * b" means "a AND b". + # * Example + # # I want to search 'HIS_KIN' AND 'human' + # db = Bio::FlatFIleIndex.new(location) + # hk = db.search('HIS_KIN') + # hu = db.search('human') + # # hk and hu are Bio::FlatFileIndex::Results objects. + # print hk * hu + # def *(a) raise 'argument must be Results class' unless a.is_a?(self.class) *************** *** 197,216 **** --- 372,428 ---- end + # Returns a string. (concatinated if multiple results exists). + # Same as to_a.join(''). + # def to_s self.values.join end + #-- #alias each_orig each + #++ + + # alias for each_value. alias each each_value + + # Iterates over each result (string). + # Same as to_a.each. + def each(&x) #:yields: str + each_value(&x) + end if false #dummy for RDoc + + #-- #alias to_a_orig to_a + #++ + + # alias for to_a. alias to_a values + # Returns an array of strings. + # If no search results are exist, returns an empty array. + # + def to_a; values; end if false #dummy for RDoc + + # Returns number of results. + # Same as to_a.size. + def size; end if false #dummy for RDoc + end #class Results ######################################################### + # Module for output debug messages. + # Default setting: If $DEBUG or $VERBOSE is true, output debug + # messages to STDERR; Otherwise, don't output messages. + # module DEBUG @@out = STDERR @@flag = nil + + # Set debug messages output destination. + # If true is given, outputs to STDERR. + # If nil is given, outputs nothing. + # This method affects ALL of FlatFileIndex related objects/methods. + # def self.out=(io) if io then *************** *** 224,230 **** --- 436,446 ---- @@out end + + # get current debug messeages output destination def self.out @@out end + + # prints debug messages def self.print(*arg) @@flag = true if $DEBUG or $VERBOSE *************** *** 235,239 **** --- 451,462 ---- ######################################################### + # Templates + # + # Internal use only. module Template + + # templates of namespace + # + # Internal use only. class NameSpace def filename *************** *** 276,279 **** --- 499,505 ---- end #module Template + # FileID class. + # + # Internal use only. class FileID def self.new_from_string(str) *************** *** 356,359 **** --- 582,588 ---- end #class FileID + # FileIDs class. + # + # Internal use only. class FileIDs < Array def initialize(prefix, hash) *************** *** 472,476 **** --- 701,712 ---- end #class FileIDs + # module for flat/1 databank + # + # Internal use only. module Flat_1 + + # Record class. + # + # Internal use only. class Record def initialize(str, size = nil) *************** *** 501,504 **** --- 737,743 ---- end #class Record + # FlatMappingFile class. + # + # Internal use only. class FlatMappingFile @@recsize_width = 4 *************** *** 786,789 **** --- 1025,1031 ---- end #class FlatMappingFile + # primary name space + # + # Internal use only. class PrimaryNameSpace < Template::NameSpace def mapping(filename) *************** *** 795,798 **** --- 1037,1043 ---- end #class PrimaryNameSpace + # secondary name space + # + # Internal use only. class SecondaryNameSpace < Template::NameSpace def mapping(filename) *************** *** 811,815 **** end #module Flat_1 ! class NameSpaces < Hash def initialize(dbname, nsclass, arg) --- 1056,1062 ---- end #module Flat_1 ! # namespaces ! # ! # Internal use only. class NameSpaces < Hash def initialize(dbname, nsclass, arg) *************** *** 873,876 **** --- 1120,1126 ---- end #class NameSpaces + # databank + # + # Internal use only. class DataBank def self.file2hash(fileobj) *************** *** 1136,1308 **** end #module Bio - ###################################################################### - - =begin - - = Bio::FlatFileIndex - - --- Bio::FlatFileIndex.new(dbname) - --- Bio::FlatFileIndex.open(dbname) - - Opens existing databank. Databank is a directory which contains - indexed files and configuration files. The type of the databank - (flat or BerkeleyDB) are determined automatically. - - --- Bio::FlatFileIndex#close - - Closes opened databank. - - --- Bio::FlatFileIndex#closed? - - Returns true if already closed. Otherwise, returns false. - - --- Bio::FlatFileIndex#get_by_id(key) - - Common interface defined in registry.rb. - Searching databank and returns entry (or entries) as a string. - Multiple entries (contatinated to one string) may be returned. - Returns empty string If not found. - - --- Bio::FlatFileIndex#search(key) - - Searching databank and returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#include?(key) - - Searching databank. - If found, returns an array of unique IDs (primary identifiers). - If not found, returns nil. - - --- Bio::FlatFileIndex#search_primary(key) - - Searching only primary namespece. - Returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#search_namespaces(key, name1, name2, ...) - - Searching only specific namespeces. - Returns a Bio::FlatFileIndex::Results object. - - --- Bio::FlatFileIndex#include_in_primary?(key) - - Same as #include?, but serching only primary namespace. - - --- Bio::FlatFileIndex#include_in_namespaces?(key, name1, name2, ...) - - Same as #include?, but serching only specific namespaces. - - --- Bio::FlatFileIndex#namespaces - - Returns names of namespaces defined in the databank. - (example: [ 'LOCUS', 'ACCESSION', 'VERSION' ] ) - - --- Bio::FlatFileIndex#primary_namespace - - Returns name of primary namespace. - - --- Bio::FlatFileIndex#secondary_namespaces - - Returns names of secondary namespaces. - - --- Bio::FlatFileIndex#default_namespaces= [ str1, str2, ... ] - --- Bio::FlatFileIndex#default_namespaces= nil - - Set default namespaces. - nil means all namespaces in the databank. - Default namespaces specified in this method only affect - #get_by_id, #search, and #include? methods. - Default of default namespaces is nil (that is, all namespaces - are search destinations by default). - - --- Bio::FlatFileIndex#default_namespaces - - Returns default namespaces. - nil means all namespaces. - - --- Bio::FlatFileIndex#check_consistency - - Raise RuntimeError if flatfiles are changed after creating - the databank. (This check only compare file sizes as - described in the OBDA specification.) - - --- Bio::FlatFileIndex#always_check_consistency=(bool) - --- Bio::FlatFileIndex#always_check_consistency - - If true, consistency checks are performed every time - accessing flatfiles. If nil/false, no checks are performed. - Default of always_check_consistency is true. - - == Bio::FlatFileIndex::Results - - This object is made by Bio::FlatFileIndex methods. - Currently, this class inherits Hash, but internal - structure of this class may be changed anytime. - Only using methods described below are strongly recomended. - - --- Bio::FlatFileIndex::Results#to_a - - Returns an array of strings. - If no search results are exist, returns an empty array. - - --- Bio::FlatFileIndex::Results#each - - Iterates over each result(string). - Same as to_a.each. - - --- Bio::FlatFileIndex::Results#to_s - - Returns a string. (concatinated if multiple results exists). - Same as to_a.join(''). - - --- Bio::FlatFileIndex::Results#size - - Returns number of results. - Same as to_a.size. - - --- Bio::FlatFileIndex::Results#+(res) - - Add search results. - "a + b" means "a OR b". - * Example - # I want to search 'ADH_IRON_1' OR 'ADH_IRON_2' - db = Bio::FlatFIleIndex.new(location) - a1 = db.search('ADH_IRON_1') - a2 = db.search('ADH_IRON_2') - # a1 and a2 are Bio::FlatFileIndex::Results objects. - print a1 + a2 - - --- Bio::FlatFileIndex::Results#*(res) - - Returns set intersection of results. - "a * b" means "a AND b". - * Example - # I want to search 'HIS_KIN' AND 'human' - db = Bio::FlatFIleIndex.new(location) - hk = db.search('HIS_KIN') - hu = db.search('human') - # hk and hu are Bio::FlatFileIndex::Results objects. - print hk * hu - - == Bio::FlatFileIndex::DEBUG - - Module for output debug messages. - Default setting: If $DEBUG or $VERBOSE is true, output debug - messages to STDERR; Otherwise, don't output messages. - - --- Bio::FlatFileIndex::DEBUG.out=(io) - - Set debug messages output destination. - If true is given, outputs to STDERR. - If nil is given, outputs nothing. - This method affects ALL of FlatFileIndex related objects/methods. - - == Other classes/modules - - Classes/modules not described in this file are internal use only. - - == SEE ALSO - - * (()) - * (()) - - =end --- 1386,1387 ---- From pjotr at pub.open-bio.org Tue Jan 31 07:27:54 2006 From: pjotr at pub.open-bio.org (Pjotr Prins) Date: Tue, 31 Jan 2006 07:27:54 +0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.9,1.10 Message-ID: <200601310727.k0V7RsVL025386@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv25376 Modified Files: Tutorial.rd Log Message: Better example Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** Tutorial.rd 1 Nov 2005 04:31:48 -0000 1.9 --- Tutorial.rd 31 Jan 2006 07:27:52 -0000 1.10 *************** *** 1,11 **** =begin ! $Id$ ! Copyright (C) 2001-2003 KATAYAMA Toshiaki Translated into English: Naohisa Goto ! Edited by: PjotrPrins NOTE: This page is a work in progress at this point --- 1,14 ---- =begin ! See the document in the CVS repository ./doc/(()) - for a potentially more up-to-date edition. This one was updated: ! $Id$ Translated into English: Naohisa Goto ! Editor: PjotrPrins ! ! Copyright (C) 2001-2003 KATAYAMA Toshiaki , 2005-2006 all ! others NOTE: This page is a work in progress at this point *************** *** 115,121 **** s = 'abc' ! puts s[0..0] - >a So when using String methods, you should subtract 1 from positions --- 118,129 ---- s = 'abc' ! puts s[0].chr ! ! >a ! ! puts s[0..1] ! ! >ab So when using String methods, you should subtract 1 from positions From pjotr at pub.open-bio.org Tue Jan 31 07:45:24 2006 From: pjotr at pub.open-bio.org (Pjotr Prins) Date: Tue, 31 Jan 2006 07:45:24 +0000 Subject: [BioRuby-cvs] bioruby/doc Tutorial.rd,1.10,1.11 Message-ID: <200601310745.k0V7jOVL025523@pub.open-bio.org> Update of /home/repository/bioruby/bioruby/doc In directory pub.open-bio.org:/tmp/cvs-serv25513/doc Modified Files: Tutorial.rd Log Message: tabs to spaces Index: Tutorial.rd =================================================================== RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** Tutorial.rd 31 Jan 2006 07:27:52 -0000 1.10 --- Tutorial.rd 31 Jan 2006 07:45:22 -0000 1.11 *************** *** 118,124 **** s = 'abc' ! puts s[0].chr ! >a puts s[0..1] --- 118,124 ---- s = 'abc' ! puts s[0].chr ! >a puts s[0..1]