[BioRuby-cvs] bioruby/doc Tutorial.rd,1.5,1.6

Fri Sep 23 04:35:46 EDT 2005

Update of /home/repository/bioruby/bioruby/doc
In directory pub.open-bio.org:/tmp/cvs-serv20985

Modified Files:
	Tutorial.rd 
Log Message:
Added assignment info to tutorial

Index: Tutorial.rd
===================================================================
RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** Tutorial.rd	16 Sep 2005 06:03:27 -0000	1.5
--- Tutorial.rd	23 Sep 2005 07:43:20 -0000	1.6
***************
*** 52,82 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      seq = Bio::Sequence::NA.new("atgcatgcaaaa")
!     
      puts seq                            # original sequence
      puts seq.complement                 # complemental sequence (Bio::Sequence::NA object)
      puts seq.subseq(3,8)                # gets subsequence of positions 3 to 8
!     
      p seq.gc_percent                    # GC percent (Float)
      p seq.composition                   # nucleic acid compositions (Hash)
!     
      puts seq.translate                  # translation (Bio::Sequence::AA object)
      puts seq.translate(2)               # translation from frame 2 (default is frame 1)
      puts seq.translate(1,11)            # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
!     
      p seq.translate.codes               # shows three-letter codes (Array)
      p seq.translate.names               # shows amino acid names (Array)
      p seq.translate.composition         # amino acid compositions (Hash)
      p seq.translate.molecular_weight    # calculating molecular weight (Float)
!     
      puts seq.complement.translate       # translation of complemental strand

  The p, print and puts methods are standard Ruby ways of outputting to
  the screen. If you want to know more about standard Ruby commands you
! can use the 'ri' command on the command line (or the help command in 
! Windows). For example 

    % ri p
--- 52,82 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      seq = Bio::Sequence::NA.new("atgcatgcaaaa")
! 
      puts seq                            # original sequence
      puts seq.complement                 # complemental sequence (Bio::Sequence::NA object)
      puts seq.subseq(3,8)                # gets subsequence of positions 3 to 8
! 
      p seq.gc_percent                    # GC percent (Float)
      p seq.composition                   # nucleic acid compositions (Hash)
! 
      puts seq.translate                  # translation (Bio::Sequence::AA object)
      puts seq.translate(2)               # translation from frame 2 (default is frame 1)
      puts seq.translate(1,11)            # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
! 
      p seq.translate.codes               # shows three-letter codes (Array)
      p seq.translate.names               # shows amino acid names (Array)
      p seq.translate.composition         # amino acid compositions (Hash)
      p seq.translate.molecular_weight    # calculating molecular weight (Float)
! 
      puts seq.complement.translate       # translation of complemental strand

  The p, print and puts methods are standard Ruby ways of outputting to
  the screen. If you want to know more about standard Ruby commands you
! can use the 'ri' command on the command line (or the help command in
! Windows). For example

    % ri p
***************
*** 97,109 ****

    >a
!  
  So when using String methods, you should subtract 1 from positions
  conventionally used in biology.  (subseq method returns nil if you
  specify positions smaller than or equal to 0 for either one of the
! "from" or "to".)  

  (TRANSLATOR'S NOTE: the text in Japanese is something wrong?)
  (EDITOR'S NOTE: should 'subseq' not throw an exception instead?)
!  
  The window_search(window_size, step_size) method shows a typical Ruby
  way of writing conscise and clear code using 'closures'. Each sliding
--- 97,109 ----

    >a
! 
  So when using String methods, you should subtract 1 from positions
  conventionally used in biology.  (subseq method returns nil if you
  specify positions smaller than or equal to 0 for either one of the
! "from" or "to".)

  (TRANSLATOR'S NOTE: the text in Japanese is something wrong?)
  (EDITOR'S NOTE: should 'subseq' not throw an exception instead?)
! 
  The window_search(window_size, step_size) method shows a typical Ruby
  way of writing conscise and clear code using 'closures'. Each sliding
***************
*** 163,174 ****
  In most cases, sequences are read from files or retrieved from databases.
  For example:
!     
      require 'bio'
!     
      input_seq = ARGF.read       # reads all files in arguments
!     
      my_naseq = Bio::Sequence::NA.new(input_seq)
      my_aaseq = my_naseq.translate
!     
      puts my_aaseq

--- 163,174 ----
  In most cases, sequences are read from files or retrieved from databases.
  For example:
! 
      require 'bio'
! 
      input_seq = ARGF.read       # reads all files in arguments
! 
      my_naseq = Bio::Sequence::NA.new(input_seq)
      my_aaseq = my_naseq.translate
! 
      puts my_aaseq

***************
*** 185,189 ****
      % ruby na2aa.rb my_naseq.txt

! Outputs 

      VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
--- 185,189 ----
      % ruby na2aa.rb my_naseq.txt

! Outputs

      VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
***************
*** 209,219 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
! 		# Read all lines from STDIN split by the GenBank delimiter 
      while entry = gets(Bio::GenBank::DELIMITER)
        gb = Bio::GenBank.new(entry)      # creates GenBank object
!     
        print ">#{gb.accession} "         # Accession
        puts gb.definition                # Definition
--- 209,219 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
! 		# Read all lines from STDIN split by the GenBank delimiter
      while entry = gets(Bio::GenBank::DELIMITER)
        gb = Bio::GenBank.new(entry)      # creates GenBank object
! 
        print ">#{gb.accession} "         # Accession
        puts gb.definition                # Definition
***************
*** 226,236 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
      ff.each_entry do |gb|
        definition = "#{gb.accession} #{gb.definition}"
!       puts gb.naseq.to_fasta(definition, 60)    
      end

--- 226,236 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
      ff.each_entry do |gb|
        definition = "#{gb.accession} #{gb.definition}"
!       puts gb.naseq.to_fasta(definition, 60)
      end

***************
*** 238,244 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
      ff.each_entry do |f|
--- 238,244 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
      ff.each_entry do |f|
***************
*** 254,264 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ff = Bio::GenBank.open("gbvrl1.seq")
      ff.each_entry do |gb|
        definition = "#{gb.accession} #{gb.definition}"
!       puts gb.naseq.to_fasta(definition, 60)    
      end

--- 254,264 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ff = Bio::GenBank.open("gbvrl1.seq")
      ff.each_entry do |gb|
        definition = "#{gb.accession} #{gb.definition}"
!       puts gb.naseq.to_fasta(definition, 60)
      end

***************
*** 270,276 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ff = Bio::FlatFile.new(Bio::GenBank, ARGF)

--- 270,276 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ff = Bio::FlatFile.new(Bio::GenBank, ARGF)

***************
*** 280,288 ****
        # shows accession and organism
        puts "# #{gb.accession} - #{gb.organism}"
!     
  			# iterates over each element in 'features'
!       gb.features.each do |feature|     
          position = feature.position
!         hash = feature.assoc            # put into Hash 

          # skips the entry if "/translation=" is not found
--- 280,288 ----
        # shows accession and organism
        puts "# #{gb.accession} - #{gb.organism}"
! 
  			# iterates over each element in 'features'
!       gb.features.each do |feature|
          position = feature.position
!         hash = feature.assoc            # put into Hash

          # skips the entry if "/translation=" is not found
***************
*** 317,325 ****
  Bio::Sequence#splicing splices subsequence from nucleic acid sequence
  according to location information used in GenBank, EMBL and DDBJ.
! (TRANSLATOR'S NOTE: EMBL and DDBJ should be added in Japanese document.) 

  When the specified translation table is different from the default
  (universal), or when the first codon is not "atg" or the protein
! contains selenocysteine, the two amino acid sequences will differ. 

  (TRANSLATOR'S NOTE: Some cases are added when two amino acid sequences
--- 317,325 ----
  Bio::Sequence#splicing splices subsequence from nucleic acid sequence
  according to location information used in GenBank, EMBL and DDBJ.
! (TRANSLATOR'S NOTE: EMBL and DDBJ should be added in Japanese document.)

  When the specified translation table is different from the default
  (universal), or when the first codon is not "atg" or the protein
! contains selenocysteine, the two amino acid sequences will differ.

  (TRANSLATOR'S NOTE: Some cases are added when two amino acid sequences
***************
*** 349,352 ****
--- 349,392 ----
  (EDITOR's NOTE: why use STRINGs here?)

+ === Alignments (Bio::Alignment)
+ 
+ Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash,
+ Array and BioPerl's Bio::SimpleAlign.  A very simple example is:
+ 
+   require 'bio'
+ 
+   seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
+   seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
+ 
+   # creates alignment object
+   a = Bio::Alignment.new(seqs)
+ 
+   # shows consensus sequence
+   p a.consensus             # ==> "a?gc?"
+ 
+   # shows IUPAC consensus
+   p a.consensus_iupac       # ==> "ahgcr"
+ 
+   # iterates over each seq
+   a.each { |x| p x }
+     # ==>
+     #    "atgca"
+     #    "aagca"
+     #    "acgca"
+     #    "acgcg"
+   # iterates over each site
+   a.each_site { |x| p x }
+     # ==>
+     #    ["a", "a", "a", "a"]
+     #    ["t", "a", "c", "c"]
+     #    ["g", "g", "g", "g"]
+     #    ["c", "c", "c", "c"]
+     #    ["a", "a", "a", "g"]
+ 
+   # doing alignment by using CLUSTAL W.
+   # clustalw command must be installed.
+   factory = Bio::ClustalW.new
+   a2 = a.do_align(factory)
+ 
  === More databases

***************
*** 356,360 ****

  In many cases the Bio::DatabaseClass acts as a factory pattern
! and recognises the database type automatically - returning a 
  parsed object. For example using Bio::FlatFile

--- 396,400 ----

  In many cases the Bio::DatabaseClass acts as a factory pattern
! and recognises the database type automatically - returning a
  parsed object. For example using Bio::FlatFile

***************
*** 366,375 ****

  Isn't it wonderful that Bio::FlatFile automagically recognizes each
! database class? 

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ff = Bio::FlatFile.auto(ARGF)
      ff.each_entry do |entry|
--- 406,415 ----

  Isn't it wonderful that Bio::FlatFile automagically recognizes each
! database class?

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ff = Bio::FlatFile.auto(ARGF)
      ff.each_entry do |entry|
***************
*** 416,437 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      # Creates FASTA factory object ("ssearch" instead of "fasta34" can also work)
      factory = Bio::Fasta.local('fasta34', ARGV.pop)
  		(EDITOR's NOTE: not consistent pop command)
!     
      # Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text)
      ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
!     
      # Iterates over each entry. the variable "entry" is a Bio::FastaFormat object.
      ff.each do |entry|
        # shows definition line (begins with '>') to the standard error output
        $stderr.puts "Searching ... " + entry.definition
!    
        # executes homology search. Returns Bio::Fasta::Report object.
        report = factory.query(entry)
!     
        # Iterates over each hit
        report.each do |hit|
--- 456,477 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      # Creates FASTA factory object ("ssearch" instead of "fasta34" can also work)
      factory = Bio::Fasta.local('fasta34', ARGV.pop)
  		(EDITOR's NOTE: not consistent pop command)
! 
      # Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text)
      ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
! 
      # Iterates over each entry. the variable "entry" is a Bio::FastaFormat object.
      ff.each do |entry|
        # shows definition line (begins with '>') to the standard error output
        $stderr.puts "Searching ... " + entry.definition
! 
        # executes homology search. Returns Bio::Fasta::Report object.
        report = factory.query(entry)
! 
        # Iterates over each hit
        report.each do |hit|
***************
*** 534,538 ****
      program = 'fasta'
      database = 'genes'
!     
      factory = Bio::Fasta.remote(program, database)

--- 574,578 ----
      program = 'fasta'
      database = 'genes'
! 
      factory = Bio::Fasta.remote(program, database)

***************
*** 548,552 ****

      # create BLAST factory object
!     factory = Bio::Blast.local('blastp', ARGV.pop) 

  For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used.
--- 588,592 ----

      # create BLAST factory object
!     factory = Bio::Blast.local('blastp', ARGV.pop)

  For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used.
***************
*** 581,585 ****
        puts hit.midline          # middle line string of alignment of homologous region (*)
        puts hit.target_seq       # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
!       
        puts hit.evalue           # E-value
        puts hit.identity         # % identity
--- 621,625 ----
        puts hit.midline          # middle line string of alignment of homologous region (*)
        puts hit.target_seq       # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
! 
        puts hit.evalue           # E-value
        puts hit.identity         # % identity
***************
*** 622,626 ****

      #!/usr/bin/env ruby
!     
      require 'bio'

--- 662,666 ----

      #!/usr/bin/env ruby
! 
      require 'bio'

***************
*** 670,676 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      ARGV.each do |id|
        entry = Bio::PubMed.query(id)     # searches PubMed and get entry
--- 710,716 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      ARGV.each do |id|
        entry = Bio::PubMed.query(id)     # searches PubMed and get entry
***************
*** 691,703 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      # Concatinates argument keyword list to a string
      keywords = ARGV.join(' ')
!     
      # PubMed keyword search
      entries = Bio::PubMed.search(keywords)
!     
      entries.each do |entry|
        medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
--- 731,743 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      # Concatinates argument keyword list to a string
      keywords = ARGV.join(' ')
! 
      # PubMed keyword search
      entries = Bio::PubMed.search(keywords)
! 
      entries.each do |entry|
        medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
***************
*** 724,739 ****

      #!/usr/bin/env ruby
!     
      require 'bio'
!     
      keywords = ARGV.join(' ')
!     
      options = {
        'maxdate' => '2003/05/31',
        'retmax' => 1000,
      }
!     
      entries = Bio::PubMed.esearch(keywords, options)
!     
      Bio::PubMed.efetch(entries).each do |entry|
        medline = Bio::MEDLINE.new(entry)
--- 764,779 ----

      #!/usr/bin/env ruby
! 
      require 'bio'
! 
      keywords = ARGV.join(' ')
! 
      options = {
        'maxdate' => '2003/05/31',
        'retmax' => 1000,
      }
! 
      entries = Bio::PubMed.esearch(keywords, options)
! 
      Bio::PubMed.efetch(entries).each do |entry|
        medline = Bio::MEDLINE.new(entry)
***************
*** 761,765 ****

  The BibTeX can be used with Tex or LaTeX to form bibliography
! information with your journal article. For more information 
  on BibTex see (EDITORS NOTE: insert URL). A quick example:

--- 801,805 ----

  The BibTeX can be used with Tex or LaTeX to form bibliography
! information with your journal article. For more information
  on BibTex see (EDITORS NOTE: insert URL). A quick example:

***************
*** 782,786 ****
  Now, you get hoge.dvi and hoge.ps - the latter you can view any
  Postscript viewer.
!     
  === Bio::Reference#bibitem

--- 822,826 ----
  Now, you get hoge.dvi and hoge.ps - the latter you can view any
  Postscript viewer.
! 
  === Bio::Reference#bibitem

***************
*** 854,859 ****
    * http://www.open-bio.org/registry/seqdatabase.ini

! Note that the last locaation refers to www.open-bio.org and is only used 
! when all local configulation files are not available. 

  In the current BioRuby implementation all local configulation files
--- 894,899 ----
    * http://www.open-bio.org/registry/seqdatabase.ini

! Note that the last locaation refers to www.open-bio.org and is only used
! when all local configulation files are not available.

  In the current BioRuby implementation all local configulation files
***************
*** 905,909 ****
      # connects to the database "genbank"
      serv = reg.get_database('genbank')
!     
      # gets entry of the ID
      entry = serv.get_by_id('AA2CG')
--- 945,949 ----
      # connects to the database "genbank"
      serv = reg.get_database('genbank')
! 
      # gets entry of the ID
      entry = serv.get_by_id('AA2CG')
***************
*** 911,915 ****

  The variable "serv" is a server object corresponding to the setting
! written in configuration files. The class of the object is one of 
  Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
  returns nil if no database is found.
--- 951,955 ----

  The variable "serv" is a server object corresponding to the setting
! written in configuration files. The class of the object is one of
  Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
  returns nil if no database is found.
***************
*** 923,927 ****
  these entries fast. There are two index types. index-flat is a simple index
  performing binary search without using an external library of Ruby. index-berkeleydb
! uses Berkeley DB for indexing - but requires installing bdb on your computer, 
  as well as the BDB Ruby package. For creating the index itself, you can use
  br_bioflat.rb command bundled with BioRuby.
--- 963,967 ----
  these entries fast. There are two index types. index-flat is a simple index
  performing binary search without using an external library of Ruby. index-berkeleydb
! uses Berkeley DB for indexing - but requires installing bdb on your computer,
  as well as the BDB Ruby package. For creating the index itself, you can use
  br_bioflat.rb command bundled with BioRuby.
***************
*** 1008,1023 ****

      #!/usr/bin/env ruby
!     
      require 'bio'

      entry = Bio::Fetch.query('hal', 'VNG1467G')
      aaseq = Bio::KEGG::GENES.new(entry).aaseq
!     
      entry = Bio::Fetch.query('aax1', 'BURA740101')
      helix = Bio::AAindex1.new(entry).index
!     
      position = 1
      win_size = 15
!     
      aaseq.window_search(win_size) do |subseq|
        score = subseq.total(helix)
--- 1048,1063 ----

      #!/usr/bin/env ruby
! 
      require 'bio'

      entry = Bio::Fetch.query('hal', 'VNG1467G')
      aaseq = Bio::KEGG::GENES.new(entry).aaseq
! 
      entry = Bio::Fetch.query('aax1', 'BURA740101')
      helix = Bio::AAindex1.new(entry).index
! 
      position = 1
      win_size = 15
! 
      aaseq.window_search(win_size) do |subseq|
        score = subseq.total(helix)
***************
*** 1076,1080 ****

  At this point for using BioRuby no additional libraries are needed.
! This may change, so keep an eye on the Bioruby website. Also when 
  a package is missing BioRuby should show an informative message.

--- 1116,1120 ----

  At this point for using BioRuby no additional libraries are needed.
! This may change, so keep an eye on the Bioruby website. Also when
  a package is missing BioRuby should show an informative message.