[BioRuby] GFF3 status (possible bug?)

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Fri Feb 6 11:29:40 UTC 2009


Hi,

Thank you for reporting bugs.

On Fri, 6 Feb 2009 14:27:10 +0900
Tomoaki NISHIYAMA <tomoakin at kenroku.kanazawa-u.ac.jp> wrote:

> Hi,
> 
> Today, I got the code from git and tried parsing a GFF3 file (from  
> TAIR8).
> 
> a code fragment
> 
> open(transcriptgff,"r").each_line do |gffline|
>    record=Bio::GFF::GFF3::Record.new(gffline)
> p record
>    curid = record.id
> p curid
> ...
> 
> results
> 
> #<Bio::GFF::GFF3::Record:0x2b439aa9c640 @frame=nil, @start=3631,  
> @strand="+", @feature="gene", @score=nil, @source="TAIR8",  
> @attributes=[["ID", "AT1G01010"], ["Note", "protein_coding_gene"],  
> ["Name", "AT1G01010"]], @end=5899, @seqname="Chr1">
> /usr/local/lib/ruby/site_ruby/1.8/bio/db/gff.rb:1084:in `[]': can't  
> convert String into Integer (TypeError)
>          from /usr/local/lib/ruby/site_ruby/1.8/bio/db/gff.rb:1084:in  
> `id'
> 
> It seems that the @attributes is now not a hash, but an array of key,  
> value pairs.

@attributes is now an array of [ key, value ] pairs.
See doc/Changes-1.3.rdoc about the changes.

> On the otherhand, id expects it to be a hash.
> 
> The code in gff.rb looks
> 
>        # Represents a single line of a GFF3-formatted file.
>        # See Bio::GFF::GFF3 for more information.
>        class Record < GFF2::Record
> 
>          include GFF3::Escape
> 
>          # shortcut to the ID attribute
>          def id
>            @attributes['ID']
>          end
> 
> I suppose this is reminiscent of the GFF when attributes were a hash.

You are right. This is apparently a bug.
I've just fixed.
http://github.com/bioruby/bioruby/commit/5258d88ef98a12fd7829eb86aa8664a18a672a43

> The change from hash to array is presumably to because
> the key may not be unique in attributes.

That's a reason why the @attributes is changed.

> A way straighten may be create key to [array of values] hash when the  
> same key are
> specified more than once. (when multiple values for each of key are  
> given it should be
> represented as key to [array of arrays].
> 
> Otherwise, we may define id to scan the array as
> def id
>    val = nil
>    @attributes.each do |keyval|
>      if(keyval[0] == 'ID')
>        val = keyval[1]
>        break
>      end
>    end
>    val
> end

Ruby has a support for an array of [ key, value ] pairs.
See Ruby reference manual for Array#assoc.
For example,
  key, val = @attributes.assoc('ID')

This is almost the same as
  key, val = @attributes.find { |a| a[0] == 'ID' }

 
> It is also nice if a function to get the attribute value for
> a specific key is provided.

New methods to set/get/replace attributes have been added.
See doc/Changes-1.3.rdoc and RDoc of Bio::GFF::GFF2 and
Bio::GFF::GFF3 for details.

> This should be easier with key to
> array of values approach, although the order of attributes
> will not be conserved.

I think it is better keeping the order of attributes, and
I determined to use an array containing [ key, value ] pairs.

> 
> Which way are you going?
> 
> I hope this can be corrected before 1.3.0 release.
> 
> Best wishes
> 
> -- 
> Tomoaki NISHIYAMA
> 
> Advanced Science Research Center,
> Kanazawa University,
> 13-1 Takara-machi,
> Kanazawa, 920-0934, Japan
> 


Thanks,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org




More information about the BioRuby mailing list