[BioRuby] SPTR problem

Ben Woodcroft donttrustben at gmail.com
Tue Jan 12 12:52:42 UTC 2010


Hi,

While parsing all the yeast UniProt txt files I came across a problem with
the gn parser - it was returning an array when I expected a hash. Looking at
the code the problem seems to be this when statement:

      when /Name=/,/ORFNames=/
        @data['GN'] = gn_uniprot_parser
      else
        @data['GN'] = gn_old_parser
      end

http://www.uniprot.org/uniprot/A2P2R3.txt has the problem on the 5th line:

GN OrderedLocusNames=YMR084W;

So GN line had OrderedLocusNames= but not  Name= or ORFNames=, so it didn't
use the new parser, like the other entries I came across. Should all 4
possibilities be tested for in the when statement: (Synonyms= being the
4th)?

Also, while I'm here:
* why does the returned hash have different keys than are in the file? e.g.
ORFNames becomes :orfs?
* I also found the parsing process for whole genomes quite slow (multiple
hours for well annotated ones).
* is there any standard way to handle concatenated UniProt files? I wrote my
own as it was simple.

Thanks,
ben

--
FYI: My email addresses at unimelb, uq and gmail all redirect to the same
place.



More information about the BioRuby mailing list