[BioRuby] Can I get more information on BioRuby

Rob Syme rob.syme at gmail.com
Fri Mar 8 05:11:42 UTC 2013

Hi Dave

Don't worry, these are appropriate questions for someone new to bioruby.
Good on you for digging through the source code.

The NCBIDB class is located at
Be sure to have a look at the top of the file for a great overview of how
the database classes are constructed, written by ktym<http://github.com/ktym>

To track down the method of autodetection, have a close look at the
 and lib/bio/io/flatfile/autodetection.rb<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb>

*Step 1) Check to see if a dbclass has been provided*
If no database class is provided to Bio::FlatFile.open, it falls
through to _open_file(dbclass,
filename, *arg)<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L155-165>,
where dbclass is nil, which then creates a new Bio::FlatFile
the first argument (dbclass) as nil.

If you look at the FlatFile initialize method, you'll see that the autodetect
method is called<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile.rb#L233-237>if
dbclass is nil.
Inside the autodetect method, a new Autodetect object is
does the work you are interested in. Note that the class method "default"
is called, which returns an Autodetect object, but does a lot of setting up

*Step 2) Set up the default FlatFile::Autodetect object*
The setup is mainly concerned with
large number of RuleTemplate instances to the @elements array. Each
detectable format type has it's own RuleTemplate.
The RuleTemplates provide three key pieces of information:
 - a name<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L46>
 - a list of rules that are less
later) that the current rule
 - a guess method<http://github.com/bioruby/bioruby/blob/master/lib/bio/io/flatfile/autodetection.rb#L76-78>,
which, when given a string, returns a dbclass if the string looks like it
belongs to that particular rule.

There are different types of RuleTemplates. The simplest is a
which has a very simple guess method, which simply matches the given string
given to a regular

The RuleTemplates also need a list of rules that are less important so that
the final array of rules can be topologically
returning a list rules with the most important at the top.

There is a large array created in Autodetect.make_default which contains a
bunch of rules that are added to the @elements array (to be tsorted later).

*Step 3) Go through the first 31 lines, and match against each of the
Way back in the FlatFile#autodetect method, a dbclass is pulled out of the
The FlatFile's BufferedInputSteam that was created
pulled out so that the Autodetect object can peek inside.
It then iterates over the first 31 lines of the
taking each line and trying it out against each
If the guess returns non-nil (it will be a dbclass), that dbclass bubbles
all the way back up to the autodetect method in

I hope that's right - I've not contributed any code here. I might be way
It does raise the question "How do I get a new dbclass to be recognised by
FlatFile.open?" I'm not really sure, but I'm confident there will be
someone here who does know. Any takers?


On Fri, Mar 8, 2013 at 3:30 AM, dave thorneycroft
<dthorneycroft at gmail.com>wrote:

> Hello,
> I am a novice Ruby programmer, ex-biologist and experienced developer in
> other languages (sorry Adobe Flex), but I have seen the light and  I want
> to contribute to BioRuby.  I am looking through the BioRuby source code and
> find somethings confusing (of course I'm a novice !).  Could anyone point
> me in the direction of any tutorials which wil help me get a better
> understanding of the code,  I've seen the stuff on BioRuby.org.  Anything
> that would explain  but how its structured and coded not how to use it to
> solve a problem.
> A couple of things are really puzzling me right now.
> Number one, please don't laugh , where is the NCBIDB class located in the
> source (I see the GenBank class inherits NCBIDB ; but I cannot find the
> class source ?). NCBIDB is a class right ?
> Number two, could anyone explain how the Bio::FlatFile 'automagically'
> recognizes each database class?
> Any pointers would be great.  Many thanks for your time, I really
> appreciate any comments.
> Regards
> Dave
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby

More information about the BioRuby mailing list