[BioRuby-cvs] bioruby/doc Tutorial.rd,1.5,1.6
Pjotr Prins
pjotr at pub.open-bio.org
Fri Sep 23 04:35:46 EDT 2005
Update of /home/repository/bioruby/bioruby/doc
In directory pub.open-bio.org:/tmp/cvs-serv20985
Modified Files:
Tutorial.rd
Log Message:
Added assignment info to tutorial
Index: Tutorial.rd
===================================================================
RCS file: /home/repository/bioruby/bioruby/doc/Tutorial.rd,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** Tutorial.rd 16 Sep 2005 06:03:27 -0000 1.5
--- Tutorial.rd 23 Sep 2005 07:43:20 -0000 1.6
***************
*** 52,82 ****
#!/usr/bin/env ruby
!
require 'bio'
!
seq = Bio::Sequence::NA.new("atgcatgcaaaa")
!
puts seq # original sequence
puts seq.complement # complemental sequence (Bio::Sequence::NA object)
puts seq.subseq(3,8) # gets subsequence of positions 3 to 8
!
p seq.gc_percent # GC percent (Float)
p seq.composition # nucleic acid compositions (Hash)
!
puts seq.translate # translation (Bio::Sequence::AA object)
puts seq.translate(2) # translation from frame 2 (default is frame 1)
puts seq.translate(1,11) # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
!
p seq.translate.codes # shows three-letter codes (Array)
p seq.translate.names # shows amino acid names (Array)
p seq.translate.composition # amino acid compositions (Hash)
p seq.translate.molecular_weight # calculating molecular weight (Float)
!
puts seq.complement.translate # translation of complemental strand
The p, print and puts methods are standard Ruby ways of outputting to
the screen. If you want to know more about standard Ruby commands you
! can use the 'ri' command on the command line (or the help command in
! Windows). For example
% ri p
--- 52,82 ----
#!/usr/bin/env ruby
!
require 'bio'
!
seq = Bio::Sequence::NA.new("atgcatgcaaaa")
!
puts seq # original sequence
puts seq.complement # complemental sequence (Bio::Sequence::NA object)
puts seq.subseq(3,8) # gets subsequence of positions 3 to 8
!
p seq.gc_percent # GC percent (Float)
p seq.composition # nucleic acid compositions (Hash)
!
puts seq.translate # translation (Bio::Sequence::AA object)
puts seq.translate(2) # translation from frame 2 (default is frame 1)
puts seq.translate(1,11) # using codon table No.11 (see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
!
p seq.translate.codes # shows three-letter codes (Array)
p seq.translate.names # shows amino acid names (Array)
p seq.translate.composition # amino acid compositions (Hash)
p seq.translate.molecular_weight # calculating molecular weight (Float)
!
puts seq.complement.translate # translation of complemental strand
The p, print and puts methods are standard Ruby ways of outputting to
the screen. If you want to know more about standard Ruby commands you
! can use the 'ri' command on the command line (or the help command in
! Windows). For example
% ri p
***************
*** 97,109 ****
>a
!
So when using String methods, you should subtract 1 from positions
conventionally used in biology. (subseq method returns nil if you
specify positions smaller than or equal to 0 for either one of the
! "from" or "to".)
(TRANSLATOR'S NOTE: the text in Japanese is something wrong?)
(EDITOR'S NOTE: should 'subseq' not throw an exception instead?)
!
The window_search(window_size, step_size) method shows a typical Ruby
way of writing conscise and clear code using 'closures'. Each sliding
--- 97,109 ----
>a
!
So when using String methods, you should subtract 1 from positions
conventionally used in biology. (subseq method returns nil if you
specify positions smaller than or equal to 0 for either one of the
! "from" or "to".)
(TRANSLATOR'S NOTE: the text in Japanese is something wrong?)
(EDITOR'S NOTE: should 'subseq' not throw an exception instead?)
!
The window_search(window_size, step_size) method shows a typical Ruby
way of writing conscise and clear code using 'closures'. Each sliding
***************
*** 163,174 ****
In most cases, sequences are read from files or retrieved from databases.
For example:
!
require 'bio'
!
input_seq = ARGF.read # reads all files in arguments
!
my_naseq = Bio::Sequence::NA.new(input_seq)
my_aaseq = my_naseq.translate
!
puts my_aaseq
--- 163,174 ----
In most cases, sequences are read from files or retrieved from databases.
For example:
!
require 'bio'
!
input_seq = ARGF.read # reads all files in arguments
!
my_naseq = Bio::Sequence::NA.new(input_seq)
my_aaseq = my_naseq.translate
!
puts my_aaseq
***************
*** 185,189 ****
% ruby na2aa.rb my_naseq.txt
! Outputs
VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
--- 185,189 ----
% ruby na2aa.rb my_naseq.txt
! Outputs
VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
***************
*** 209,219 ****
#!/usr/bin/env ruby
!
require 'bio'
!
! # Read all lines from STDIN split by the GenBank delimiter
while entry = gets(Bio::GenBank::DELIMITER)
gb = Bio::GenBank.new(entry) # creates GenBank object
!
print ">#{gb.accession} " # Accession
puts gb.definition # Definition
--- 209,219 ----
#!/usr/bin/env ruby
!
require 'bio'
!
! # Read all lines from STDIN split by the GenBank delimiter
while entry = gets(Bio::GenBank::DELIMITER)
gb = Bio::GenBank.new(entry) # creates GenBank object
!
print ">#{gb.accession} " # Accession
puts gb.definition # Definition
***************
*** 226,236 ****
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
ff.each_entry do |gb|
definition = "#{gb.accession} #{gb.definition}"
! puts gb.naseq.to_fasta(definition, 60)
end
--- 226,236 ----
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
ff.each_entry do |gb|
definition = "#{gb.accession} #{gb.definition}"
! puts gb.naseq.to_fasta(definition, 60)
end
***************
*** 238,244 ****
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
ff.each_entry do |f|
--- 238,244 ----
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
ff.each_entry do |f|
***************
*** 254,264 ****
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::GenBank.open("gbvrl1.seq")
ff.each_entry do |gb|
definition = "#{gb.accession} #{gb.definition}"
! puts gb.naseq.to_fasta(definition, 60)
end
--- 254,264 ----
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::GenBank.open("gbvrl1.seq")
ff.each_entry do |gb|
definition = "#{gb.accession} #{gb.definition}"
! puts gb.naseq.to_fasta(definition, 60)
end
***************
*** 270,276 ****
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
--- 270,276 ----
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
***************
*** 280,288 ****
# shows accession and organism
puts "# #{gb.accession} - #{gb.organism}"
!
# iterates over each element in 'features'
! gb.features.each do |feature|
position = feature.position
! hash = feature.assoc # put into Hash
# skips the entry if "/translation=" is not found
--- 280,288 ----
# shows accession and organism
puts "# #{gb.accession} - #{gb.organism}"
!
# iterates over each element in 'features'
! gb.features.each do |feature|
position = feature.position
! hash = feature.assoc # put into Hash
# skips the entry if "/translation=" is not found
***************
*** 317,325 ****
Bio::Sequence#splicing splices subsequence from nucleic acid sequence
according to location information used in GenBank, EMBL and DDBJ.
! (TRANSLATOR'S NOTE: EMBL and DDBJ should be added in Japanese document.)
When the specified translation table is different from the default
(universal), or when the first codon is not "atg" or the protein
! contains selenocysteine, the two amino acid sequences will differ.
(TRANSLATOR'S NOTE: Some cases are added when two amino acid sequences
--- 317,325 ----
Bio::Sequence#splicing splices subsequence from nucleic acid sequence
according to location information used in GenBank, EMBL and DDBJ.
! (TRANSLATOR'S NOTE: EMBL and DDBJ should be added in Japanese document.)
When the specified translation table is different from the default
(universal), or when the first codon is not "atg" or the protein
! contains selenocysteine, the two amino acid sequences will differ.
(TRANSLATOR'S NOTE: Some cases are added when two amino acid sequences
***************
*** 349,352 ****
--- 349,392 ----
(EDITOR's NOTE: why use STRINGs here?)
+ === Alignments (Bio::Alignment)
+
+ Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash,
+ Array and BioPerl's Bio::SimpleAlign. A very simple example is:
+
+ require 'bio'
+
+ seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
+ seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
+
+ # creates alignment object
+ a = Bio::Alignment.new(seqs)
+
+ # shows consensus sequence
+ p a.consensus # ==> "a?gc?"
+
+ # shows IUPAC consensus
+ p a.consensus_iupac # ==> "ahgcr"
+
+ # iterates over each seq
+ a.each { |x| p x }
+ # ==>
+ # "atgca"
+ # "aagca"
+ # "acgca"
+ # "acgcg"
+ # iterates over each site
+ a.each_site { |x| p x }
+ # ==>
+ # ["a", "a", "a", "a"]
+ # ["t", "a", "c", "c"]
+ # ["g", "g", "g", "g"]
+ # ["c", "c", "c", "c"]
+ # ["a", "a", "a", "g"]
+
+ # doing alignment by using CLUSTAL W.
+ # clustalw command must be installed.
+ factory = Bio::ClustalW.new
+ a2 = a.do_align(factory)
+
=== More databases
***************
*** 356,360 ****
In many cases the Bio::DatabaseClass acts as a factory pattern
! and recognises the database type automatically - returning a
parsed object. For example using Bio::FlatFile
--- 396,400 ----
In many cases the Bio::DatabaseClass acts as a factory pattern
! and recognises the database type automatically - returning a
parsed object. For example using Bio::FlatFile
***************
*** 366,375 ****
Isn't it wonderful that Bio::FlatFile automagically recognizes each
! database class?
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.auto(ARGF)
ff.each_entry do |entry|
--- 406,415 ----
Isn't it wonderful that Bio::FlatFile automagically recognizes each
! database class?
#!/usr/bin/env ruby
!
require 'bio'
!
ff = Bio::FlatFile.auto(ARGF)
ff.each_entry do |entry|
***************
*** 416,437 ****
#!/usr/bin/env ruby
!
require 'bio'
!
# Creates FASTA factory object ("ssearch" instead of "fasta34" can also work)
factory = Bio::Fasta.local('fasta34', ARGV.pop)
(EDITOR's NOTE: not consistent pop command)
!
# Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text)
ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
!
# Iterates over each entry. the variable "entry" is a Bio::FastaFormat object.
ff.each do |entry|
# shows definition line (begins with '>') to the standard error output
$stderr.puts "Searching ... " + entry.definition
!
# executes homology search. Returns Bio::Fasta::Report object.
report = factory.query(entry)
!
# Iterates over each hit
report.each do |hit|
--- 456,477 ----
#!/usr/bin/env ruby
!
require 'bio'
!
# Creates FASTA factory object ("ssearch" instead of "fasta34" can also work)
factory = Bio::Fasta.local('fasta34', ARGV.pop)
(EDITOR's NOTE: not consistent pop command)
!
# Reads FASTA-formatted files (TRANSLATOR'S NOTE: something wrong in Japanese text)
ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
!
# Iterates over each entry. the variable "entry" is a Bio::FastaFormat object.
ff.each do |entry|
# shows definition line (begins with '>') to the standard error output
$stderr.puts "Searching ... " + entry.definition
!
# executes homology search. Returns Bio::Fasta::Report object.
report = factory.query(entry)
!
# Iterates over each hit
report.each do |hit|
***************
*** 534,538 ****
program = 'fasta'
database = 'genes'
!
factory = Bio::Fasta.remote(program, database)
--- 574,578 ----
program = 'fasta'
database = 'genes'
!
factory = Bio::Fasta.remote(program, database)
***************
*** 548,552 ****
# create BLAST factory object
! factory = Bio::Blast.local('blastp', ARGV.pop)
For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used.
--- 588,592 ----
# create BLAST factory object
! factory = Bio::Blast.local('blastp', ARGV.pop)
For remote execution of BLAST in GenomeNet, Bio::Blast.remote is used.
***************
*** 581,585 ****
puts hit.midline # middle line string of alignment of homologous region (*)
puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
!
puts hit.evalue # E-value
puts hit.identity # % identity
--- 621,625 ----
puts hit.midline # middle line string of alignment of homologous region (*)
puts hit.target_seq # hit sequence (TRANSLATOR'S NOTE: sequence of homologous region of query sequence)
!
puts hit.evalue # E-value
puts hit.identity # % identity
***************
*** 622,626 ****
#!/usr/bin/env ruby
!
require 'bio'
--- 662,666 ----
#!/usr/bin/env ruby
!
require 'bio'
***************
*** 670,676 ****
#!/usr/bin/env ruby
!
require 'bio'
!
ARGV.each do |id|
entry = Bio::PubMed.query(id) # searches PubMed and get entry
--- 710,716 ----
#!/usr/bin/env ruby
!
require 'bio'
!
ARGV.each do |id|
entry = Bio::PubMed.query(id) # searches PubMed and get entry
***************
*** 691,703 ****
#!/usr/bin/env ruby
!
require 'bio'
!
# Concatinates argument keyword list to a string
keywords = ARGV.join(' ')
!
# PubMed keyword search
entries = Bio::PubMed.search(keywords)
!
entries.each do |entry|
medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
--- 731,743 ----
#!/usr/bin/env ruby
!
require 'bio'
!
# Concatinates argument keyword list to a string
keywords = ARGV.join(' ')
!
# PubMed keyword search
entries = Bio::PubMed.search(keywords)
!
entries.each do |entry|
medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
***************
*** 724,739 ****
#!/usr/bin/env ruby
!
require 'bio'
!
keywords = ARGV.join(' ')
!
options = {
'maxdate' => '2003/05/31',
'retmax' => 1000,
}
!
entries = Bio::PubMed.esearch(keywords, options)
!
Bio::PubMed.efetch(entries).each do |entry|
medline = Bio::MEDLINE.new(entry)
--- 764,779 ----
#!/usr/bin/env ruby
!
require 'bio'
!
keywords = ARGV.join(' ')
!
options = {
'maxdate' => '2003/05/31',
'retmax' => 1000,
}
!
entries = Bio::PubMed.esearch(keywords, options)
!
Bio::PubMed.efetch(entries).each do |entry|
medline = Bio::MEDLINE.new(entry)
***************
*** 761,765 ****
The BibTeX can be used with Tex or LaTeX to form bibliography
! information with your journal article. For more information
on BibTex see (EDITORS NOTE: insert URL). A quick example:
--- 801,805 ----
The BibTeX can be used with Tex or LaTeX to form bibliography
! information with your journal article. For more information
on BibTex see (EDITORS NOTE: insert URL). A quick example:
***************
*** 782,786 ****
Now, you get hoge.dvi and hoge.ps - the latter you can view any
Postscript viewer.
!
=== Bio::Reference#bibitem
--- 822,826 ----
Now, you get hoge.dvi and hoge.ps - the latter you can view any
Postscript viewer.
!
=== Bio::Reference#bibitem
***************
*** 854,859 ****
* http://www.open-bio.org/registry/seqdatabase.ini
! Note that the last locaation refers to www.open-bio.org and is only used
! when all local configulation files are not available.
In the current BioRuby implementation all local configulation files
--- 894,899 ----
* http://www.open-bio.org/registry/seqdatabase.ini
! Note that the last locaation refers to www.open-bio.org and is only used
! when all local configulation files are not available.
In the current BioRuby implementation all local configulation files
***************
*** 905,909 ****
# connects to the database "genbank"
serv = reg.get_database('genbank')
!
# gets entry of the ID
entry = serv.get_by_id('AA2CG')
--- 945,949 ----
# connects to the database "genbank"
serv = reg.get_database('genbank')
!
# gets entry of the ID
entry = serv.get_by_id('AA2CG')
***************
*** 911,915 ****
The variable "serv" is a server object corresponding to the setting
! written in configuration files. The class of the object is one of
Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
returns nil if no database is found.
--- 951,955 ----
The variable "serv" is a server object corresponding to the setting
! written in configuration files. The class of the object is one of
Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
returns nil if no database is found.
***************
*** 923,927 ****
these entries fast. There are two index types. index-flat is a simple index
performing binary search without using an external library of Ruby. index-berkeleydb
! uses Berkeley DB for indexing - but requires installing bdb on your computer,
as well as the BDB Ruby package. For creating the index itself, you can use
br_bioflat.rb command bundled with BioRuby.
--- 963,967 ----
these entries fast. There are two index types. index-flat is a simple index
performing binary search without using an external library of Ruby. index-berkeleydb
! uses Berkeley DB for indexing - but requires installing bdb on your computer,
as well as the BDB Ruby package. For creating the index itself, you can use
br_bioflat.rb command bundled with BioRuby.
***************
*** 1008,1023 ****
#!/usr/bin/env ruby
!
require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')
aaseq = Bio::KEGG::GENES.new(entry).aaseq
!
entry = Bio::Fetch.query('aax1', 'BURA740101')
helix = Bio::AAindex1.new(entry).index
!
position = 1
win_size = 15
!
aaseq.window_search(win_size) do |subseq|
score = subseq.total(helix)
--- 1048,1063 ----
#!/usr/bin/env ruby
!
require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')
aaseq = Bio::KEGG::GENES.new(entry).aaseq
!
entry = Bio::Fetch.query('aax1', 'BURA740101')
helix = Bio::AAindex1.new(entry).index
!
position = 1
win_size = 15
!
aaseq.window_search(win_size) do |subseq|
score = subseq.total(helix)
***************
*** 1076,1080 ****
At this point for using BioRuby no additional libraries are needed.
! This may change, so keep an eye on the Bioruby website. Also when
a package is missing BioRuby should show an informative message.
--- 1116,1120 ----
At this point for using BioRuby no additional libraries are needed.
! This may change, so keep an eye on the Bioruby website. Also when
a package is missing BioRuby should show an informative message.
More information about the bioruby-cvs
mailing list