[BioRuby] Genbank file parsing question

Nick Thrower throwern at msu.edu
Mon Sep 17 17:28:56 UTC 2012


Hi Josh,

1.)
You are getting an error because you must pass an open stream to the 'new' method
http://bioruby.org/rdoc/Bio/FlatFile.html#method-c-new

If you want to supply a file location you should use the 'open' method
http://bioruby.org/rdoc/Bio/FlatFile.html#method-c-open

gb = Bio::FlatFile.open(Bio::GenBank,'/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk')

2.)
The locus line is position parsed, and it looks like your locus is beyond the hard coded bounds:
http://bioruby.org/rdoc/Bio/GenBank/Locus.html (look at the source for 'new')

Maybe somebody else could help with that?

3.)
To access the organism line you need to drill down through the data. A Genbank file is made up of several entries. Each entry has many features, and each feature has many qualifiers.

gb.first.features.first.qualifiers.select{|f| f.qualifier=='organism'}
 => [#<Bio::Feature::Qualifier:0x000001012e99b8 @qualifier="organism", @value="Atopobium vaginae B758">]

-Nick

-- 
Nick Thrower
Information Technologist
Michigan State University
Great Lakes Bioenergy Research Center
East Lansing MI 48824

> 
> Hi Nick,
> Yeah, sorry about the genbank example, it appears to have lost all the formatting when I sent the email.  This might be more handy:
> http://pastebin.com/N1D7jUuu 
> I'm running into several issues.  The first is if I try and load the file from which the above excerpt is from, whenever I load the file, and call methods on it, this is what happens (for example):
> bioruby> gb = Bio::FlatFile.new(Bio::GenBank, '/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk')  ==> #<Bio::FlatFile:0x00000005237800 @stream=#<Bio::FlatFile::BufferedInputStream:0x000000051bd3c0 @io="/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk", @path=nil, @buffer="">, @dbclass=Bio::GenBank, @splitter=#<Bio::FlatFile::Splitter::Default:0x000000050f3778 @dbclass=Bio::GenBank, @stream=#<Bio::FlatFile::BufferedInputStream:0x000000051bd3c0 @io="/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk", @path=nil, @buffer="">, @entry_pos_flag=nil, @delimiter="\n//\n", @header="LOCUS ", @delimiter_overrun=nil>, @skip_leader_mode=:firsttime, @firsttime_flag=true, @raw=false>But, if I try to call any methods on this:bioruby> gb.firstNoMethodError: private method `gets' called for "/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk":String        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile/buffer.rb:251:in `gets'        fro!
> m /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile/splitter.rb:161:in `skip_leader'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile.rb:283:in `next_entry'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile.rb:335:in `each_entry'        from (irb):4:in `first'        from (irb):4        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:41:in `block in <top (required)>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:40:in `catch'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:40:in `<top (required)>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/bioruby:19:in `load'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/bioruby:19:in `<main>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/ruby_noexec_wrapper:14:in `eval'        from /home/josh/.r!
> vm/gems/ruby-1.9.2-p290 at proj2/bin/ruby_noexec_wrapper:14:in `<main>
> opening these files with Bio::FlatFile.auto('Atopobium_vaginae.gbk') seems to work inconsistently, but for this file it opens ok.  Also, Bio::GenBank.new('Atopobium_vaginae.gbk') will open this file and seems to work the most consistently.  
> Loading into this object truncates the Locus id from:
> ctg7180000000048 toctg7180000
> i.e.bioruby> gb.first.locus.entry_id  ==> "ctg7180000"
> And if I attempt to say something like:bioruby> gb.first.organism  ==> ""
> It is just an empty string.  Does this variable not get set for each genbank entry?  The organism is listed under the "source" attribute in the file.  
> Not all of these are really errors per se, but odd behavior.
> ~josh






More information about the BioRuby mailing list