[BioRuby-cvs] bioruby/lib/bio sequence.rb,0.56,0.57
Katayama Toshiaki
k at dev.open-bio.org
Sun Mar 26 02:28:01 UTC 2006
- Previous message: [BioRuby-cvs] bioruby/lib/bio/sequence aa.rb, 1.2, 1.3 common.rb, 1.2, 1.3 compat.rb, 1.2, 1.3 format.rb, 1.2, 1.3 generic.rb, 1.3, 1.4 na.rb, 1.2, 1.3
- Next message: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.21,1.22
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /home/repository/bioruby/bioruby/lib/bio
In directory dev.open-bio.org:/tmp/cvs-serv28853
Modified Files:
sequence.rb
Log Message:
* comprehensive documentations contributed by Ryan Raaum and Jan Aerts are added.
* bug fixes in sequence.rb contributed by Ryan Raaum
* Added 'U' and 'u' to the bases counted towards the nucleic acid total in Bio::Sequence#guess. (Without this, RNA sequences were "guessed" to be Amino Acid sequences).
* Changed the arguments for method_missing in Bio::Sequence from (*arg) to (sym, *args, &block). With this argument set, blocks will be properly passed through to the encapsulated object.
Index: sequence.rb
===================================================================
RCS file: /home/repository/bioruby/bioruby/lib/bio/sequence.rb,v
retrieving revision 0.56
retrieving revision 0.57
diff -C2 -d -r0.56 -r0.57
*** sequence.rb 17 Feb 2006 17:15:08 -0000 0.56
--- sequence.rb 26 Mar 2006 02:27:59 -0000 0.57
***************
*** 5,9 ****
# Toshiaki Katayama <k at bioruby.org>,
# Yoshinori K. Okuji <okuji at enbug.org>,
! # Naohisa Goto <ng at bioruby.org>
# License:: Ruby's
#
--- 5,11 ----
# Toshiaki Katayama <k at bioruby.org>,
# Yoshinori K. Okuji <okuji at enbug.org>,
! # Naohisa Goto <ng at bioruby.org>,
! # Ryan Raaum <ryan at raaum.org>,
! # Jan Aerts <jan.aerts at bbsrc.ac.uk>
# License:: Ruby's
#
***************
*** 15,18 ****
--- 17,67 ----
module Bio
+ # = DESCRIPTION
+ # Bio::Sequence objects represent annotated sequences in bioruby.
+ # A Bio::Sequence object is a wrapper around the actual sequence,
+ # represented as either a Bio::Sequence::NA or a Bio::Sequence::AA object.
+ # For most users, this encapsulation will be completely transparent.
+ # Bio::Sequence responds to all methods defined for Bio::Sequence::NA/AA
+ # objects using the same arguments and returning the same values (even though
+ # these methods are not documented specifically for Bio::Sequence).
+ #
+ # = USAGE
+ # # Create a nucleic or amino acid sequence
+ # dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA')
+ # rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa')
+ # aa = Bio::Sequence.auto('ACDEFGHIKLMNPQRSTVWYU')
+ #
+ # # Print it out
+ # puts dna.to_s
+ # puts aa.to_s
+ #
+ # # Get a subsequence, bioinformatics style (first nucleotide is '1')
+ # puts dna.subseq(2,6)
+ #
+ # # Get a subsequence, informatics style (first nucleotide is '0')
+ # puts dna[2,6]
+ #
+ # # Print in FASTA format
+ # puts dna.output(:fasta)
+ #
+ # # Print all codons
+ # dna.window_search(3,3) do |codon|
+ # puts codon
+ # end
+ #
+ # # Splice or otherwise mangle your sequence
+ # puts dna.splicing("complement(join(1..5,16..20))")
+ # puts rna.splicing("complement(join(1..5,16..20))")
+ #
+ # # Convert a sequence containing ambiguity codes into a
+ # # regular expression you can use for subsequent searching
+ # puts aa.to_re
+ #
+ # # These should speak for themselves
+ # puts dna.complement
+ # puts dna.composition
+ # puts dna.molecular_weight
+ # puts dna.translate
+ # puts dna.gc_percent
class Sequence
***************
*** 23,37 ****
autoload :Format, 'bio/sequence/format'
def initialize(str)
@seq = str
end
! def method_missing(*arg)
! @seq.send(*arg)
end
!
! attr_accessor :entry_id, :definition, :features, :references, :comments,
! :date, :keywords, :dblinks, :taxonomy, :moltype, :seq
!
def output(style)
extend Bio::Sequence::Format
--- 72,151 ----
autoload :Format, 'bio/sequence/format'
+ # Create a new Bio::Sequence object
+ #
+ # s = Bio::Sequence.new('atgc')
+ # puts s #=> 'atgc'
+ #
+ # Note that this method does not intialize the contained sequence
+ # as any kind of bioruby object, only as a simple string
+ #
+ # puts s.seq.class #=> String
+ #
+ # See Bio::Sequence#na, Bio::Sequence#aa, and Bio::Sequence#auto
+ # for methods to transform the basic String of a just created
+ # Bio::Sequence object to a proper bioruby object
+ # ---
+ # *Arguments*:
+ # * (required) _str_: String or Bio::Sequence::NA/AA object
+ # *Returns*:: Bio::Sequence object
def initialize(str)
@seq = str
end
! # Pass any unknown method calls to the wrapped sequence object. see
! # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing
! def method_missing(sym, *args, &block) #:nodoc:
! @seq.send(sym, *args, &block)
end
!
! # The sequence identifier. For example, for a sequence
! # of Genbank origin, this is the accession number.
! attr_accessor :entry_id
!
! # A String with a description of the sequence
! attr_accessor :definition
!
! # An Array of Bio::Feature objects
! attr_accessor :features
!
! # An Array of Bio::Reference objects
! attr_accessor :references
!
! # A comment String
! attr_accessor :comments
!
! # Date from sequence source. Often date of deposition.
! attr_accessor :date
!
! # An Array of Strings
! attr_accessor :keywords
!
! # An Array of Strings; links to other database entries.
! attr_accessor :dblinks
!
! # A taxonomy String
! attr_accessor :taxonomy
!
! # Bio::Sequence::NA/AA
! attr_accessor :moltype
!
! # The sequence object, usually Bio::Sequence::NA/AA,
! # but could be a simple String
! attr_accessor :seq
!
! # Using Bio::Sequence::Format, return a String with the Bio::Sequence
! # object formatted in the given style.
! #
! # Formats currently implemented are: 'fasta', 'genbank', and 'embl'
! #
! # s = Bio::Sequence.new('atgc')
! # puts s.output(:fasta) #=> "> \natgc\n"
! #
! # The style argument is given as a Ruby
! # Symbol(http://www.ruby-doc.org/core/classes/Symbol.html)
! # ---
! # *Arguments*:
! # * (required) _style_: :fasta, :genbank, *or* :embl
! # *Returns*:: String object
def output(style)
extend Bio::Sequence::Format
***************
*** 48,51 ****
--- 162,175 ----
end
+ # Guess the type of sequence, Amino Acid or Nucleic Acid, and create a
+ # new sequence object (Bio::Sequence::AA or Bio::Sequence::NA) on the basis
+ # of this guess. This method will change the current Bio::Sequence object.
+ #
+ # s = Bio::Sequence.new('atgc')
+ # puts s.seq.class #=> String
+ # s.auto
+ # puts s.seq.class #=> Bio::Sequence::NA
+ # ---
+ # *Returns*:: Bio::Sequence::NA/AA object
def auto
@moltype = guess
***************
*** 57,60 ****
--- 181,194 ----
end
+ # Given a sequence String, guess its type, Amino Acid or Nucleic Acid, and
+ # return a new Bio::Sequence object wrapping a sequence of the guessed type
+ # (either Bio::Sequence::AA or Bio::Sequence::NA)
+ #
+ # s = Bio::Sequence.auto('atgc')
+ # puts s.seq.class #=> Bio::Sequence::NA
+ # ---
+ # *Arguments*:
+ # * (required) _str_: String *or* Bio::Sequence::NA/AA object
+ # *Returns*:: Bio::Sequence object
def self.auto(str)
seq = self.new(str)
***************
*** 63,74 ****
end
def guess(threshold = 0.9, length = 10000, index = 0)
str = @seq.to_s[index,length].to_s.extend Bio::Sequence::Common
cmp = str.composition
! bases = cmp['A'] + cmp['T'] + cmp['G'] + cmp['C'] +
! cmp['a'] + cmp['t'] + cmp['g'] + cmp['c']
! total = @seq.length - cmp['N'] - cmp['n']
if bases.to_f / total > threshold
--- 197,247 ----
end
+ # Guess the class of the current sequence. Returns the class
+ # (Bio::Sequence::AA or Bio::Sequence::NA) guessed. In general, used by
+ # developers only, but if you know what you are doing, feel free.
+ #
+ # s = Bio::Sequence.new('atgc')
+ # puts s.guess #=> Bio::Sequence::NA
+ #
+ # There are three parameters: `threshold`, `length`, and `index`.
+ #
+ # The `threshold` value (defaults to 0.9) is the frequency of
+ # nucleic acid bases [AGCTUagctu] required in the sequence for this method
+ # to produce a Bio::Sequence::NA "guess". In the default case, if less
+ # than 90% of the bases (after excluding [Nn]) are in the set [AGCTUagctu],
+ # then the guess is Bio::Sequence::AA.
+ #
+ # s = Bio::Sequence.new('atgcatgcqq')
+ # puts s.guess #=> Bio::Sequence::AA
+ # puts s.guess(0.8) #=> Bio::Sequence::AA
+ # puts s.guess(0.7) #=> Bio::Sequence::NA
+ #
+ # The `length` value is how much of the total sequence to use in the
+ # guess (default 10000). If your sequence is very long, you may
+ # want to use a smaller amount to reduce the computational burden.
+ #
+ # s = Bio::Sequence.new(A VERY LONG SEQUENCE)
+ # puts s.guess(0.9, 1000) # limit the guess to the first 1000 positions
+ #
+ # The `index` value is where to start the guess. Perhaps you know there
+ # are a lot of gaps at the start...
+ #
+ # s = Bio::Sequence.new('-----atgcc')
+ # puts s.guess #=> Bio::Sequence::AA
+ # puts s.guess(0.9,10000,5) #=> Bio::Sequence::NA
+ # ---
+ # *Arguments*:
+ # * (optional) _threshold_: Float in range 0,1 (default 0.9)
+ # * (optional) _length_: Fixnum (default 10000)
+ # * (optional) _index_: Fixnum (default 1)
+ # *Returns*:: Bio::Sequence::NA/AA
def guess(threshold = 0.9, length = 10000, index = 0)
str = @seq.to_s[index,length].to_s.extend Bio::Sequence::Common
cmp = str.composition
! bases = cmp['A'] + cmp['T'] + cmp['G'] + cmp['C'] + cmp['U'] +
! cmp['a'] + cmp['t'] + cmp['g'] + cmp['c'] + cmp['u']
! total = str.length - cmp['N'] - cmp['n']
if bases.to_f / total > threshold
***************
*** 79,86 ****
--- 252,312 ----
end
+ # Guess the class of a given sequence. Returns the class
+ # (Bio::Sequence::AA or Bio::Sequence::NA) guessed. In general, used by
+ # developers only, but if you know what you are doing, feel free.
+ #
+ # puts .guess('atgc') #=> Bio::Sequence::NA
+ #
+ # There are three optional parameters: `threshold`, `length`, and `index`.
+ #
+ # The `threshold` value (defaults to 0.9) is the frequency of
+ # nucleic acid bases [AGCTUagctu] required in the sequence for this method
+ # to produce a Bio::Sequence::NA "guess". In the default case, if less
+ # than 90% of the bases (after excluding [Nn]) are in the set [AGCTUagctu],
+ # then the guess is Bio::Sequence::AA.
+ #
+ # puts Bio::Sequence.guess('atgcatgcqq') #=> Bio::Sequence::AA
+ # puts Bio::Sequence.guess('atgcatgcqq', 0.8) #=> Bio::Sequence::AA
+ # puts Bio::Sequence.guess('atgcatgcqq', 0.7) #=> Bio::Sequence::NA
+ #
+ # The `length` value is how much of the total sequence to use in the
+ # guess (default 10000). If your sequence is very long, you may
+ # want to use a smaller amount to reduce the computational burden.
+ #
+ # # limit the guess to the first 1000 positions
+ # puts Bio::Sequence.guess('A VERY LONG SEQUENCE', 0.9, 1000)
+ #
+ # The `index` value is where to start the guess. Perhaps you know there
+ # are a lot of gaps at the start...
+ #
+ # puts Bio::Sequence.guess('-----atgcc') #=> Bio::Sequence::AA
+ # puts Bio::Sequence.guess('-----atgcc',0.9,10000,5) #=> Bio::Sequence::NA
+ # ---
+ # *Arguments*:
+ # * (required) _str_: String *or* Bio::Sequence::NA/AA object
+ # * (optional) _threshold_: Float in range 0,1 (default 0.9)
+ # * (optional) _length_: Fixnum (default 10000)
+ # * (optional) _index_: Fixnum (default 1)
+ # *Returns*:: Bio::Sequence::NA/AA
def self.guess(str, *args)
self.new(str).guess(*args)
end
+ # Transform the sequence wrapped in the current Bio::Sequence object
+ # into a Bio::Sequence::NA object. This method will change the current
+ # object. This method does not validate your choice, so be careful!
+ #
+ # s = Bio::Sequence.new('RRLE')
+ # puts s.seq.class #=> String
+ # s.na
+ # puts s.seq.class #=> Bio::Sequence::NA !!!
+ #
+ # However, if you know your sequence type, this method may be
+ # constructively used after initialization,
+ #
+ # s = Bio::Sequence.new('atgc')
+ # s.na
+ # ---
+ # *Returns*:: Bio::Sequence::NA
def na
@seq = NA.new(@seq)
***************
*** 88,96 ****
end
def aa
@seq = AA.new(@seq)
@moltype = AA
end
!
end # Sequence
--- 314,338 ----
end
+ # Transform the sequence wrapped in the current Bio::Sequence object
+ # into a Bio::Sequence::NA object. This method will change the current
+ # object. This method does not validate your choice, so be careful!
+ #
+ # s = Bio::Sequence.new('atgc')
+ # puts s.seq.class #=> String
+ # s.aa
+ # puts s.seq.class #=> Bio::Sequence::AA !!!
+ #
+ # However, if you know your sequence type, this method may be
+ # constructively used after initialization,
+ #
+ # s = Bio::Sequence.new('RRLE')
+ # s.aa
+ # ---
+ # *Returns*:: Bio::Sequence::AA
def aa
@seq = AA.new(@seq)
@moltype = AA
end
!
end # Sequence
- Previous message: [BioRuby-cvs] bioruby/lib/bio/sequence aa.rb, 1.2, 1.3 common.rb, 1.2, 1.3 compat.rb, 1.2, 1.3 format.rb, 1.2, 1.3 generic.rb, 1.3, 1.4 na.rb, 1.2, 1.3
- Next message: [BioRuby-cvs] bioruby/lib/bio reference.rb,1.21,1.22
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the bioruby-cvs
mailing list