[BioRuby] GFF3 status (possible bug?)

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Fri Feb 6 05:27:10 UTC 2009


Hi,

Today, I got the code from git and tried parsing a GFF3 file (from  
TAIR8).

a code fragment

open(transcriptgff,"r").each_line do |gffline|
   record=Bio::GFF::GFF3::Record.new(gffline)
p record
   curid = record.id
p curid
...

results

#<Bio::GFF::GFF3::Record:0x2b439aa9c640 @frame=nil, @start=3631,  
@strand="+", @feature="gene", @score=nil, @source="TAIR8",  
@attributes=[["ID", "AT1G01010"], ["Note", "protein_coding_gene"],  
["Name", "AT1G01010"]], @end=5899, @seqname="Chr1">
/usr/local/lib/ruby/site_ruby/1.8/bio/db/gff.rb:1084:in `[]': can't  
convert String into Integer (TypeError)
         from /usr/local/lib/ruby/site_ruby/1.8/bio/db/gff.rb:1084:in  
`id'

It seems that the @attributes is now not a hash, but an array of key,  
value pairs.
On the otherhand, id expects it to be a hash.

The code in gff.rb looks

       # Represents a single line of a GFF3-formatted file.
       # See Bio::GFF::GFF3 for more information.
       class Record < GFF2::Record

         include GFF3::Escape

         # shortcut to the ID attribute
         def id
           @attributes['ID']
         end

I suppose this is reminiscent of the GFF when attributes were a hash.
The change from hash to array is presumably to because
the key may not be unique in attributes.

A way straighten may be create key to [array of values] hash when the  
same key are
specified more than once. (when multiple values for each of key are  
given it should be
represented as key to [array of arrays].

Otherwise, we may define id to scan the array as
def id
   val = nil
   @attributes.each do |keyval|
     if(keyval[0] == 'ID')
       val = keyval[1]
       break
     end
   end
   val
end

It is also nice if a function to get the attribute value for
a specific key is provided. This should be easier with key to
array of values approach, although the order of attributes
will not be conserved.

Which way are you going?

I hope this can be corrected before 1.3.0 release.

Best wishes

-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan




More information about the BioRuby mailing list