[BioRuby] SPTR problem

Ben Woodcroft donttrustben at gmail.com
Tue Jan 19 02:15:30 UTC 2010


Hi,

Thanks for the response. embedded.

2010/1/16 Naohisa GOTO <ngoto at gen-info.osaka-u.ac.jp>

>
> It seems to be a bug. Perhaps there were no (or very few) entries
> which only had OrderedLocusNames= when the code was first written
> in 2005. The commit Id in git was b5c3342437ed698f215a87ea72c6cabf0575709d.
>

I was figuring that. Also, since no actual exception was thrown, errors
might not have been noticed. I wrote a patch for this that I've been using
internally, but haven't included unit tests.
http://github.com/wwood/bioruby/commit/b2f6cb0b
Happy to write tests, but you seem to rewrite my patches anyway..


>
> The GN format was changed in UniProtKB release 2.0 of 05-Jul-2004.
> The document http://www.uniprot.org/docs/sp_news.htm says:
> | The new format of the GN line is:
> |
> | GN   Name=<name>; Synonyms=<name1>[, <name2>...];
> OrderedLocusNames=<name1>[, <name2>...];
> | GN   ORFNames=<name1>[, <name2>...];
> |
> | None of the above four tokens are mandatory. But a "Synonyms" token can
> only be present if there is a "Name" token.
>
> You are right the 4 possibilities should be considered.
> "Synonyms" can be eliminated, but it may be safe to be included.
>
> > Also, while I'm here:
> > * why does the returned hash have different keys than are in the file?
> e.g.
> > ORFNames becomes :orfs?
>
> I don't know. Now, I think using the same names as described
> in the original entries may be preferred, too.
>

What do you suggest we do about this?


>
> > * I also found the parsing process for whole genomes quite slow (multiple
> > hours for well annotated ones).
>
> Please use profiler to find bottlenecks.
>  % ruby -rprofile xxx.rb
>

I tried to do something like that but in the end found it easier to pre-grep
the uniprot file, keeping only the lines relevant to me. There was too many
levels of indirection in my code for me to bother tracking it down.


>
> > * is there any standard way to handle concatenated UniProt files? I wrote
> my
> > own as it was simple.
>
> What type of "concatenated" do you mean?
> For simple concatenation, for example, original file distributed
> from UniProt FTP site, Bio::FlatFile can be used.
>
> ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
> (please gunzip before reading!)
>
>  ff = Bio::FlatFile.open("uniprot_sprot.dat")
>  ff.each do |e|
>   puts e.entry_id
>  end
>

More evidence I'm an idiot. Like I needed any.
Thanks,
ben



More information about the BioRuby mailing list