[BioRuby] GFF3 attribute parser problems

Ben Woodcroft donttrustben at gmail.com
Sun Jun 29 14:33:43 UTC 2008


Thanks for your reply. I've had a look at the spec - it seems to be
far more complex that the current bioruby GFF3 is with the cross
references, predefined tags, fasta, etc. - but I agree it would be
good to have a fully featured parser/writer.

I fixed it so with all the escaping characters, and created the
corresponding tests, and committed to the same github repo.

I don't know of any GFF file that uses the illegal blackslashes, so I
took that code out. Actually, the current 1.2.1 code doesn't either -
the backslash code isn't ever accessed as far as I can tell.

I added to_s methods for GFF3 class and corresponding Record class,
like you suggested.

I think a big problem with GFF files is that they load the whole thing
into memory, and the to_s method doesn't fix this. Maybe in the
future...

Thanks,
ben

2008/6/29 Naohisa GOTO <ngoto at gen-info.osaka-u.ac.jp>:
> Hi,
>
> Thank you for reporting bugs.
>
> The GFF3 specification http://song.sourceforge.net/gff3.shtml
> says that URL escaping rule are used for escaping semicolons,
> not backslashes.
>
> (cited from http://song.sourceforge.net/gff3.shtml)
>>> Column 9: "attributes"
>>>
>>> A list of feature attributes in the format tag=value.  Multiple
>>> tag=value pairs are separated by semicolons.  URL escaping rules are
>>> used for tags or values containing the following characters: ",=;".
>
> So, the existing code in BioRuby 1.2.1 is apparently wrong.
> (I don't know, but perhaps it might be written before the
> specification was well established?)
>
> If nonstandard (and illegal) GFF3 data using backslash for escape
> is popular, we should also consider it, but the main GFF3 class
> should keep the official specification.
>
> I see your changes in git, but your code seems to be still using
> "wrong" escaping rule and unconscious of escaping of other characters
> (",=;&%" and %XX escapes).
>
> BTW, I think the Bio::GFF classes in bioruby should be changed
> to supportcreating GFF objects from scratch and output of GFFs.
>
> Thanks,
>
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>
>
> On Sun, 29 Jun 2008 11:57:39 +1000
> "Ben Woodcroft" <donttrustben at gmail.com> wrote:
>
>> I have attempted a fix, and pushed it to github. I forked the main
>> branch, not the testing one, because I class this as a bug fix, not a
>> new feature. Available at
>> http://github.com/wwood/bioruby/tree/master
>>
>> I actually had to create a new class
>> Bio::GFF::GFF3::Record<Bio::GFF::Record because the parsing of the
>> attributes happens inside the record, not the parser. I'm not sure
>> this is the most sensible way, but I'm following the laziness virtue
>> for now.
>>
>> I hope these kinds of commits get added to the main repo..
>>
>> Thanks,
>> ben
>>
>>
>> 2008/6/28 Ben Woodcroft <donttrustben at gmail.com>:
>> > not fixed as of most recent git commit, either
>> > http://github.com/bioruby/bioruby/tree/master/lib/bio/db/gff.rb
>> >
>> > line 120
>> >
>> > 2008/6/28 Ben Woodcroft <donttrustben at gmail.com>:
>> >> Hi,
>> >>
>> >> I noticed there is a problem with the Bio::GFF::GFF3 class. Using bioruby 1.2.1
>> >>
>> >>  class GFF3 < GFF
>> >>    VERSION = 3
>> >>
>> >>    private
>> >>
>> >>    def parse_attributes(attributes)
>> >>      hash = Hash.new
>> >>      attributes.split(/[^\\];/).each do |atr|
>> >>        key, value = atr.split('=', 2)
>> >>        hash[key] = value
>> >>      end
>> >>      return hash
>> >>    end
>> >>
>> >> My problem is with the split([/^\\]) bit, because it chops off an
>> >> extra character at the end of the key:
>> >>
>> >> irb(main):001:0> 'abc=def;one=two'.split(/[^\\];/)
>> >> => ["abc=de", "one=two"]
>> >>
>> >> where we want
>> >> => ["abc=def", "one=two"]
>> >>
>> >>
>> >>
>> >> I took a shortcut to solve it and ignored the escaping of the ; and
>> >> just did this
>> >>
>> >>    hash = Hash.new
>> >>    attributes.split(/;/).each do |atr|
>> >>      key, value = atr.split('=', 2)
>> >>      hash[key] = value
>> >>    end
>> >>    return hash
>> >>
>> >>
>> >> Thanks,
>> >> ben
>> >>
>> >
>> >
>> >
>> > --
>> > FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
>> >
>>
>>
>>
>> --
>> FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>



-- 
FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.



More information about the BioRuby mailing list