[BioRuby] [PATCH] GO annotations fixes and improvements

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Tue Aug 3 16:13:27 UTC 2010


Hi Ralf,

Thank you to send patches.
I reviewed the patch. Please see the comments below.
Some part of the patches will be merged soon, and some would
be later, and some will not be merged.

On Tue, 3 Aug 2010 08:58:16 +0200
Ralf Stephan <ralf at ark.in-berlin.de> wrote:

> --- a/lib/bio/db/go.rb
> +++ b/lib/bio/db/go.rb
> @@ -186,6 +186,18 @@ class GO
>    #    p [entry.entry_id, entry.evidence, entry.goid]
>    #  end
>    #
> +  class ArrayOrString
> +    def initialize(arg)
> +      @var = arg
> +    end
> +    def join(char)
> +      if @var.instance_of? String
> +        then return @var
> +        else return @var.join(char)
> +      end
> +    end
> +  end

I disagree with the class. For GAF, there is no need to introduce
such new wrapper class.

> @@ -253,30 +265,34 @@ class GO
>      
>      # 
>      attr_reader :assigned_by 
> -    
> +
>      alias entry_id db_object_id
>  
>  
> -    # Parsing an entry (in a line) in the gene_association flatfile.  
> -    def initialize(entry) 
> -      tmp = entry.chomp.split(/\t/)
> +    # Assign fields of an entry (in a line).  
> +    def assign(tmp) 

I don't like the method name. The word "assign" is used in the
context of Gene Ontology Annotation, and it is better not to use
the word for the class internal use to avoid confusion.
 

> @@ -293,17 +309,15 @@ class GO
>  
>      # Bio::GO::GeneAssociation#to_str -> a line of gene_association file.
>      def to_str
> -      return [@db, @db_object_id, @db_object_symbol, @quialifier, @goid, 
> -              @qualifier.join("|"), @evidence, @with.join("|"), @aspect,
> +      return [@db, @db_object_id, @db_object_symbol, @qualifier, @goid, 
> +              @db_reference.join("|"), @evidence, @with.join("|"), @aspect,
>                @db_object_name, @db_object_synonym.join("|"), @db_object_type,
>                @taxon, @date, @assigned_by].join("\t")
>      end

This seems bug fix. Thanks!
By the way, I think it is good to change to_str to to_s, because
the GeneAssociation class do not need to behave like a string.


> --- a/lib/bio/db/go.rb
> +++ b/lib/bio/db/go.rb
> @@ -266,6 +266,11 @@ class GO
>      # 
>      attr_reader :assigned_by 
>  
> +    attr_reader :annotation_extension
> +
> +    attr_reader :gene_product_form_id
> +    
> +

If you want to add GeneAssociation2 class, these new attributes
should only be added in the GeneAssociation2 class.

Alternatively, it is also good to support both GAF 1.0 and 2.0
in the GeneAssociation class.

>      alias entry_id db_object_id
>  
>  
> @@ -286,6 +291,8 @@ class GO
>        @taxon             = tmp[12] # taxon:4932
>        @date              = tmp[13] # 20010118
>        @assigned_by       = tmp[14] 
> +      @annotation_extension = tmp[15]
> +      @gene_product_form_id = tmp[16]
>      end
>  
>      # Parsing an entry (in a line) in the gene_association flatfile.  
> @@ -317,6 +324,31 @@ class GO
>  
>    end # class GeneAssociation   
>  
> +  class GeneAssociation2 < GeneAssociation
> +
> +    # Iterator through all entries
> +    def self.parser(str)
> +      if block_given?
> +        str.each_line(DELIMITER) {|line|
> +          next if /^!/ =~ line
> +          yield GeneAssociation2.new(line)
> +        }
> +      else
> +        galist = []
> +        str.each_line(DELIMITER) {|line|
> +          next if /^!/ =~ line
> +          galist << GeneAssociation2.new(line)
> +        }
> +        return galist
> +      end
> +    end
> +
> +    # Bio::GO::GeneAssociation#to_str -> a line of gene_association file.
> +    def to_str
> +      return [super.to_str, @annotation_extension, @gene_product_form_id].join("\t")
> +    end
> +  end
> +

The role of the GeneAssociation2 class will be carefully considered.
It might be merged to the GeneAssociation class.

The method name "parser" may be changed, or the method might not
be merged.

> +  class Phenote_GOA < GeneAssociation

The name of the class would be changed, based on the format
name used in the Phenote community.

> +    # Assign fields of an entry (in a line) in Phenote format.  
> +    def assign(tmp) 
> +      @db                = tmp[0] 
> +      @db_object_id      = tmp[1]
> +      @db_object_symbol  = tmp[2]
> +      @qualifier         = tmp[3]  # 
> +      @goid              = tmp[4]
> +      # We ignore Phenote's tmp[5]

Please do not ignore. When supporting a new data format, all data
should be parsed and stored unless it is technically very difficult.


Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org




More information about the BioRuby mailing list