[BioRuby] adding a flatfile format

GOTO Naohisa ngoto at gen-info.osaka-u.ac.jp
Mon Jun 12 09:56:58 UTC 2006


Hi,

On Mon, 5 Jun 2006 14:00:56 -0400
Anoop Ranganath <ahr6y at virginia.edu> wrote:

> I'd like to add a simple flatfile format to use with bioruby.  In  
> particular, I'm looking at BED files, which have a very simple tab  
> delimited format.  I'm poking around in the code, and haven't been  
> able to find a good example of how to incorporate a new parser.  I'm  
> not so much interested in adding support for autodetection, so I  
> can't imagine that it would be too difficult.
> 
> Does anyone have a starting point I can use?
> 
> Thanks,
> Anoop 

If you are using bioruby-1.0.0, some critical bugs have been
found in the flatfile.rb. Please apply attached patch.

A very simple parser for tab separated values. It reads an entire file
at a time when initializing the parser class.
######################################################################
require 'bio'

# very simple parser for tab-separated data
class SimpleFormat
   # delimiter needed for flatfile
   DELIMITER = RS = nil # nil means no delimiter and reading entire file
   def initialize(str)
      @data = str.split(/\n/).collect { |x| x.to_s.split(/\t/) }
   end
   attr_reader :data
end

# example code to read a file 'test.dat' and show data
Bio::FlatFile.open(SimpleFormat, 'test.dat') do |ff|
   ff.each do |entry|
      p entry.data
   end
end
######################################################################

A simple example to parse a file with multiple entries, and
each end of the entry is '//'.
######################################################################
require 'bio'

# very simple parser for "//"-separated entries
class SimpleFormat2
   # delimiter needed for flatfile
   DELIMITER = RS = '//' # the end of each entry is '//'
   def initialize(str)
      # very simple parser only to store a text data
      @data = str
   end
   attr_reader :data
end

# example code to read a file 'sample.gbk' and shows each entry
Bio::FlatFile.open(SimpleFormat2, 'sample.gbk') do |ff|
   ff.each do |entry|
      p entry.data
   end
end
######################################################################

If you want to parse a data with variable delimiters or no explicit
delimiters, you need to write a splitter class to read an entry
from an IO wrapper object created by the FlatFile class.
However, the specifications of the splitter class haven't been
clearly determined yet, and will be changed in the near future.

In addition, the flatfile.rb is now under re-construction and
descriptions above might be changed.

-- 
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
Department of Genome Informatics, Genome Information Research Center,
Research Institute for Microbial Diseases, Osaka University, Japan

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flatfile.patch
URL: <http://lists.open-bio.org/pipermail/bioruby/attachments/20060612/2bc61698/attachment.ksh>


More information about the BioRuby mailing list