[BioRuby] Genbank file parsing question

Josh Earl joshearl1 at hotmail.com
Mon Sep 17 18:46:21 UTC 2012


Hey Nick,
Wow, that was incredibly helpful, thanks.  One of the reasons I was confused about with the Bio::FlatFile.new method is the 
http://thebird.nl/bioruby/Tutorial.rd.html is a bit confusing in that regard (always uses the .new method instead of .open for Bio::FlatFile).  Is it the correct usage on the tutorial, or was I just interpreting that incorrectly?
Is there a reason to go specifically in hard-coding the locations instead of just splitting it into an array?  I suppose there are issues with both, since if there are missing values, the locations in the array would be incorrect, and if the locus name is too long that might move the other fields out of position.  
And thanks for clarifying how to get access to the organism.  It does seem a bit odd to me that the first genbank record would have an "organism" method that wasn't set, even if there is an organism name listed.  For instance 
gb.first
refers to a single genbank record, right?  So, what is 
gb.first.organism
referring to, if not the organism of that record?  I guess I have similar questions about the .classification and .taxonomy methods (both of which return empty values on the first genbank record).  It seems odd that you would have to dig into the record like that to get the information, especially if the methods are available on a record.  Maybe they refer to something else than the items listed in the "source" feature?  
~josh

Center for Genomic Sciences
(412)-359-8341

> From: throwern at msu.edu
> Date: Mon, 17 Sep 2012 13:28:56 -0400
> To: bioruby at lists.open-bio.org
> Subject: Re: [BioRuby] Genbank file parsing question
> 
> Hi Josh,
> 
> 1.)
> You are getting an error because you must pass an open stream to the 'new' method
> http://bioruby.org/rdoc/Bio/FlatFile.html#method-c-new
> 
> If you want to supply a file location you should use the 'open' method
> http://bioruby.org/rdoc/Bio/FlatFile.html#method-c-open
> 
> gb = Bio::FlatFile.open(Bio::GenBank,'/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk')
> 
> 2.)
> The locus line is position parsed, and it looks like your locus is beyond the hard coded bounds:
> http://bioruby.org/rdoc/Bio/GenBank/Locus.html (look at the source for 'new')
> 
> Maybe somebody else could help with that?
> 
> 3.)
> To access the organism line you need to drill down through the data. A Genbank file is made up of several entries. Each entry has many features, and each feature has many qualifiers.
> 
> gb.first.features.first.qualifiers.select{|f| f.qualifier=='organism'}
>  => [#<Bio::Feature::Qualifier:0x000001012e99b8 @qualifier="organism", @value="Atopobium vaginae B758">]
> 
> -Nick
> 
> -- 
> Nick Thrower
> Information Technologist
> Michigan State University
> Great Lakes Bioenergy Research Center
> East Lansing MI 48824
> 
> > 
> > Hi Nick,
> > Yeah, sorry about the genbank example, it appears to have lost all the formatting when I sent the email.  This might be more handy:
> > http://pastebin.com/N1D7jUuu 
> > I'm running into several issues.  The first is if I try and load the file from which the above excerpt is from, whenever I load the file, and call methods on it, this is what happens (for example):
> > bioruby> gb = Bio::FlatFile.new(Bio::GenBank, '/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk')  ==> #<Bio::FlatFile:0x00000005237800 @stream=#<Bio::FlatFile::BufferedInputStream:0x000000051bd3c0 @io="/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk", @path=nil, @buffer="">, @dbclass=Bio::GenBank, @splitter=#<Bio::FlatFile::Splitter::Default:0x000000050f3778 @dbclass=Bio::GenBank, @stream=#<Bio::FlatFile::BufferedInputStream:0x000000051bd3c0 @io="/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk", @path=nil, @buffer="">, @entry_pos_flag=nil, @delimiter="\n//\n", @header="LOCUS ", @delimiter_overrun=nil>, @skip_leader_mode=:firsttime, @firsttime_flag=true, @raw=false>But, if I try to call any methods on this:bioruby> gb.firstNoMethodError: private method `gets' called for "/mnt/p/o_drive/Homes/jearl/Magee/Atopobium_vaginae.gbk":String        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile/buffer.rb:251:in `gets'        f!
>  ro!
> > m /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile/splitter.rb:161:in `skip_leader'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile.rb:283:in `next_entry'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/lib/bio/io/flatfile.rb:335:in `each_entry'        from (irb):4:in `first'        from (irb):4        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:41:in `block in <top (required)>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:40:in `catch'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/gems/bio-1.4.3/bin/bioruby:40:in `<top (required)>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/bioruby:19:in `load'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/bioruby:19:in `<main>'        from /home/josh/.rvm/gems/ruby-1.9.2-p290 at proj2/bin/ruby_noexec_wrapper:14:in `eval'        from /home/josh/.!
>  r!
> > vm/gems/ruby-1.9.2-p290 at proj2/bin/ruby_noexec_wrapper:14:in `<main>
> > opening these files with Bio::FlatFile.auto('Atopobium_vaginae.gbk') seems to work inconsistently, but for this file it opens ok.  Also, Bio::GenBank.new('Atopobium_vaginae.gbk') will open this file and seems to work the most consistently.  
> > Loading into this object truncates the Locus id from:
> > ctg7180000000048 toctg7180000
> > i.e.bioruby> gb.first.locus.entry_id  ==> "ctg7180000"
> > And if I attempt to say something like:bioruby> gb.first.organism  ==> ""
> > It is just an empty string.  Does this variable not get set for each genbank entry?  The organism is listed under the "source" attribute in the file.  
> > Not all of these are really errors per se, but odd behavior.
> > ~josh
> 
> 
> 
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
 		 	   		  



More information about the BioRuby mailing list