[BioRuby] GFF3 attribute parser problems

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Sun Jun 29 09:51:08 UTC 2008


Hi,

Thank you for reporting bugs.

The GFF3 specification http://song.sourceforge.net/gff3.shtml
says that URL escaping rule are used for escaping semicolons,
not backslashes.

(cited from http://song.sourceforge.net/gff3.shtml)
>> Column 9: "attributes"
>>
>> A list of feature attributes in the format tag=value.  Multiple
>> tag=value pairs are separated by semicolons.  URL escaping rules are
>> used for tags or values containing the following characters: ",=;".

So, the existing code in BioRuby 1.2.1 is apparently wrong.
(I don't know, but perhaps it might be written before the
specification was well established?)

If nonstandard (and illegal) GFF3 data using backslash for escape
is popular, we should also consider it, but the main GFF3 class
should keep the official specification.

I see your changes in git, but your code seems to be still using
"wrong" escaping rule and unconscious of escaping of other characters
(",=;&%" and %XX escapes).

BTW, I think the Bio::GFF classes in bioruby should be changed
to supportcreating GFF objects from scratch and output of GFFs.

Thanks,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org


On Sun, 29 Jun 2008 11:57:39 +1000
"Ben Woodcroft" <donttrustben at gmail.com> wrote:

> I have attempted a fix, and pushed it to github. I forked the main
> branch, not the testing one, because I class this as a bug fix, not a
> new feature. Available at
> http://github.com/wwood/bioruby/tree/master
> 
> I actually had to create a new class
> Bio::GFF::GFF3::Record<Bio::GFF::Record because the parsing of the
> attributes happens inside the record, not the parser. I'm not sure
> this is the most sensible way, but I'm following the laziness virtue
> for now.
> 
> I hope these kinds of commits get added to the main repo..
> 
> Thanks,
> ben
> 
> 
> 2008/6/28 Ben Woodcroft <donttrustben at gmail.com>:
> > not fixed as of most recent git commit, either
> > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb
> >
> > line 120
> >
> > 2008/6/28 Ben Woodcroft <donttrustben at gmail.com>:
> >> Hi,
> >>
> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1
> >>
> >>  class GFF3 < GFF
> >>    VERSION = 3
> >>
> >>    private
> >>
> >>    def parse_attributes(attributes)
> >>      hash = Hash.new
> >>      attributes.split(/[^\\];/).each do |atr|
> >>        key, value = atr.split('=', 2)
> >>        hash[key] = value
> >>      end
> >>      return hash
> >>    end
> >>
> >> My problem is with the split([/^\\]) bit, because it chops off an
> >> extra character at the end of the key:
> >>
> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/)
> >> => ["abc=de", "one=two"]
> >>
> >> where we want
> >> => ["abc=def", "one=two"]
> >>
> >>
> >>
> >> I took a shortcut to solve it and ignored the escaping of the ; and
> >> just did this
> >>
> >>    hash = Hash.new
> >>    attributes.split(/;/).each do |atr|
> >>      key, value = atr.split('=', 2)
> >>      hash[key] = value
> >>    end
> >>    return hash
> >>
> >>
> >> Thanks,
> >> ben
> >>
> >
> >
> >
> > --
> > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
> >
> 
> 
> 
> -- 
> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby



More information about the BioRuby mailing list