[BioRuby] Can I get more information on BioRuby

Rob Syme rob.syme at gmail.com
Fri Mar 8 05:11:42 UTC 2013


Hi Dave

Don't worry, these are appropriate questions for someone new to bioruby.
Good on you for digging through the source code.

The NCBIDB class is located at
lib/bio/db.rb<http://github.com/bioruby/bioruby/blob/master/lib/bio/db.rb#L233>
.
Be sure to have a look at the top of the file for a great overview of how
the database classes are constructed, written by ktym<http://github.com/ktym>
.

To track down the method of autodetection, have a close look at the
lib/bio/io/flatfile.rb<https://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L80-111>
 and lib/bio/io/flatfile/autodetection.rb<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb>
.


*Step 1) Check to see if a dbclass has been provided*
If no database class is provided to Bio::FlatFile.open, it falls
through to _open_file(dbclass,
filename, *arg)<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L155-165>,
where dbclass is nil, which then creates a new Bio::FlatFile
instance<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L158>
with
the first argument (dbclass) as nil.

If you look at the FlatFile initialize method, you'll see that the autodetect
method is called<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L233-237>if
dbclass is nil.
Inside the autodetect method, a new Autodetect object is
created<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L429>
which
does the work you are interested in. Note that the class method "default"
is called, which returns an Autodetect object, but does a lot of setting up
first<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L348>
.


*Step 2) Set up the default FlatFile::Autodetect object*
The setup is mainly concerned with
adding<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L363>
a
large number of RuleTemplate instances to the @elements array. Each
detectable format type has it's own RuleTemplate.
The RuleTemplates provide three key pieces of information:
 - a name<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L46>
 - a list of rules that are less
important<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L50-55>(checked
later) that the current rule
 - a guess method<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L76-78>,
which, when given a string, returns a dbclass if the string looks like it
belongs to that particular rule.

There are different types of RuleTemplates. The simplest is a
RegRegexp<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L143-173>,
which has a very simple guess method, which simply matches the given string
given to a regular
expression<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L170-173>
.

The RuleTemplates also need a list of rules that are less important so that
the final array of rules can be topologically
sorted<http://www.ruby-doc.org/stdlib-1.9.3/libdoc/tsort/rdoc/TSort.html>,
returning a list rules with the most important at the top.

There is a large array created in Autodetect.make_default which contains a
bunch of rules that are added to the @elements array (to be tsorted later).

*Step 3) Go through the first 31 lines, and match against each of the
RuleTemplates*
Way back in the FlatFile#autodetect method, a dbclass is pulled out of the
FlatFile::Autodetect#autodetect_flatfile
method<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L430>
.
The FlatFile's BufferedInputSteam that was created
earlier<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L161>
is
pulled out so that the Autodetect object can peek inside.
It then iterates over the first 31 lines of the
file<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L333-342>,
taking each line and trying it out against each
rule<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L309>.
If the guess returns non-nil (it will be a dbclass), that dbclass bubbles
all the way back up to the autodetect method in
FlatFile<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L430>
.


I hope that's right - I've not contributed any code here. I might be way
off.
It does raise the question "How do I get a new dbclass to be recognised by
FlatFile.open?" I'm not really sure, but I'm confident there will be
someone here who does know. Any takers?

-r
*
*



On Fri, Mar 8, 2013 at 3:30 AM, dave thorneycroft
<dthorneycroft at gmail.com>wrote:

> Hello,
>
> I am a novice Ruby programmer, ex-biologist and experienced developer in
> other languages (sorry Adobe Flex), but I have seen the light and  I want
> to contribute to BioRuby.  I am looking through the BioRuby source code and
> find somethings confusing (of course I'm a novice !).  Could anyone point
> me in the direction of any tutorials which wil help me get a better
> understanding of the code,  I've seen the stuff on BioRuby.org.  Anything
> that would explain  but how its structured and coded not how to use it to
> solve a problem.
>
> A couple of things are really puzzling me right now.
>
> Number one, please don't laugh , where is the NCBIDB class located in the
> source (I see the GenBank class inherits NCBIDB ; but I cannot find the
> class source ?). NCBIDB is a class right ?
> Number two, could anyone explain how the Bio::FlatFile 'automagically'
> recognizes each database class?
>
> Any pointers would be great.  Many thanks for your time, I really
> appreciate any comments.
>
> Regards
> Dave
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>



More information about the BioRuby mailing list