[BioRuby] Bio::GO::GeneAssociation issue/fix and new unit test file

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Mon Apr 19 13:35:51 UTC 2010


Hi Ben,

On Sat, 17 Apr 2010 18:25:12 +1000
Ben Woodcroft <donttrustben at gmail.com> wrote:

> Hi,
> 
> Not to be pushy, but is there any movement on this? Ignoring my suggestions
> for API changes (which aren't implemented), can the bug fixes be merged?
> Thanks,
> ben

This suggests Mitsuteru Nakao, current maintainer of the classes,
is too busy.

On 8 April 2010 22:14, Ben Woodcroft <donttrustben at gmail.com> wrote:

> Hi,
>
> I had some problems parsing gene association files using Bio::Flatfile,
> caused because the parser was attempting to use the split method on a nil.
> The offending line was
>
> @db_reference      = tmp[5].split(/\|/)  #

The GO Annotation File Format 1.0 defines that each line has 15
tab-delimited fields (except comment line), and in this case,
theoretically no attempt would be made to use the split method on
a nil. Of course, in real data, it seems it is very inconvenient
to get such exceptions, and I agree to fix.

> That seemed easy enough to fix, but then I noticed there wasn't any test
> cases to test my changes against, so I made a new file
> test/unit/db/test_go.rb, including a simulation of one that was giving me
> problems. I've collected these changes in a new branch, and you can see the
> difference using the new github compare interface at
>
> http://github.com/wwood/bioruby/compare/36041377db...gene_association

The patch seems good and will be merged.
Minor thing: no need to check both nil and empty.
>> @db_reference      = (tmp[5].nil? or tmp[5].empty?) ? [] : tmp[5].split(/\|/)
will be shortened:
 @db_reference      = tmp[5] ? tmp[5].split(/\|/) : []
or
 @db_reference      = tmp[5].to_s.split(/\|/)

> Is there any reason that the variables that correspond to arrays in
> GeneAssociation (@db_reference, @with, @db_object_synonym) are singular
> names, and not plural? It would be simple to add a alias_method
> db_references -> db_reference right?

I suppose these were picked from the older version of
"GO Annotation File Format 1.0 Guide".
http://web.archive.org/web/20030401212209/http://www.geneontology.org/doc/GO.annotation.html
http://web.archive.org/web/20040803050222/www.geneontology.org/GO.annotation.html
(Current version: http://www.geneontology.org/GO.format.gaf-1_0.shtml )

In the file format definition, each column is shortly described
with words of singular form. The first authors of the class might
have used the names as they were, with only replacing colons and
spaces to "_" and lower-casing.

I can agree adding the aliases, if you, an active user of the
class, feel confusing with the current method names. Please
propose better names.

> I also don't agree that the 'GO:' part of the identifier be chopped off by
> default by the goid method - gene association files are not necessarily
> concerned with GO - there are other ontologies out there as well. I
> personally never look at GO identifiers without the 'GO:' bit, so I was
> surprised when I saw that.

To aviod confusion, I think adding a new method "go_id" which matches
with the above naming rule for the current format definition, and
changing the method "goid" to be deprecated (with warning message).
(It seems the short name for the column was renamed from "GOid" to
"GO ID" in 2004).

> > Sound OK?
> > Thanks,
> > ben

Thank you.


Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org



More information about the BioRuby mailing list