[BioRuby] Parsing GFF3 attributes

Michael Han mh6 at sanger.ac.uk
Tue May 15 16:10:20 UTC 2007


On 15 May 2007, at 16:30, hienle at club-internet.fr wrote:
> Hello all,
>
> I am working with a GFF3-formatted file and have noticed that the  
> attributes field is not parsed properly.
>
> In bio/db/gff.rb,
>
>     75      def parse_attributes(attributes)
>     76        hash = Hash.new
>     77        attributes.split(/[^\\];/).each do |atr|
>     78          key, value = atr.split(' ', 2)
>     79          hash[key] = value
>     80        end
>     81        return hash
>     82      end
>     83    end
>
> I changed :
>     78          key, value = atr.split(' ', 2)
> to:
>     78          key, value = atr.split('=', 2)
>
> and it now appears to behave properly. However, I am not certain if  
> this is appropriate for backward compatibility with GFF and GFF2.

I use normally spaces between the key and the value of the attributes  
for GFF2 like: Gene "1234" ; Transcript "1234"
as described in <"http://www.sanger.ac.uk/Software/formats/GFF/ 
GFF_Spec.shtml">

so it would break  GFF2 / GFF parsing.
Maybe you could create a separate GFF3 parser inheriting from the  
gff.rb .

some GFF3 reference (note: last version from a few weeks ago)
<"http://www.sequenceontology.org/gff3.shtml">

> Is anyone working on parsing GFF3 files?
>
> Thank you in advance for your help,
> -Hien

MIchael



More information about the BioRuby mailing list