[BioRuby] Problem with Bio::GFF::GFF2

George Githinji georgkam at gmail.com
Tue Jun 9 14:24:38 UTC 2009


Thank you so much Naohisa for the excellent explanation!!
however

bep_gff.records.each do |record|
   p record.seqname
end

returns
"seq1   bepipred-1.0b epitope          1     1   0.173  . .   ."


which is not what is intended and
record.score, record.start etc all return nil.

:(





On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO
<ngoto at gen-info.osaka-u.ac.jp>wrote:

> Hi George,
>
> On Tue, 9 Jun 2009 15:26:45 +0300
> George Githinji <georgkam at gmail.com> wrote:
>
> > Hi all,
> > I am try to parse a GFF file. The file looks like this
> >
> > ##gff-version 2
> > ##source-version bepipred-1.0b
> > ##date 2009-06-09
> > ##Type Protein seq1
> > # seqname            source        feature      start   end   score  N/A
>   ?
> > #
> >
> ---------------------------------------------------------------------------
> > seq1   bepipred-1.0b epitope          1     1   0.173  . .   .
> > seq1   bepipred-1.0b epitope          2     2  -0.043  . .   .
> > seq1  bepipred-1.0b epitope          3     3  -0.014  . .   .
> > seq1   bepipred-1.0b epitope          4     4   0.144  . .   .
> > seq1   bepipred-1.0b epitope          5     5   0.250  . .   .
> > seq1   bepipred-1.0b epitope          6     6   0.218  . .   .
> >
> > ....truncated
>
> The above GFF records do not contain any "attributes".
> The field definition of each GFF line is:
> <seqname> <source> <feature> <start> <end> <score> <strand> <frame>
> [attributes] [comments]
>
> When talking about GFF, the word "attributes" points the
> "attributes" field in each GFF line.
>
> See the GFF2 specifications document for details.
> http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml
>
> > and i have written the following lines with an aim of extracting the
> start,
> > end and score attributes. but before that i wanted to know whether the
> full
> > attributes are available. so i did the following.
> >
> > require 'rubygems'
> > require 'bio'
> > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff'))
> >
> >  bep_gff.records.each do |record|
> >     puts record.attributes_to_hash.inspect
> > end
> >
> > However, i get empty hashes.
> > Any ideas?
>
> Because the Bio::GFF2::Record#attributes_to_hash method returns
> "attributes" as a hash, and all "attributes" field in the above
> GFF2 records are empty, showing empty hashes is logically right.
>
> If you really want a hash, adding each field into a hash would
> be the easiest way. For example,
>
>  bep_gff.records.each do |record|
>      h = {}
>     h['seqname']    = record.seqname
>     h['source']     = record.source
>     h['feature']    = record.feature
>     h['start']      = record.start
>     h['end']        = record.end
>     h['score']      = record.score
>     h['strand']     = record.strand
>     h['frame']      = record.frame
>     h['attributes'] = record.attributes_to_hash
>     p h
>  end
>
> Bio::GFF2::Record have seqname, source, feature, start, end,
> score, strand, frame attributes(so called in the Ruby language),
> which are inherited from Bio::GFF::Record class.
> Normally, it is natural using the above attributes(in Ruby)
> directly without creating a hash.
>
> Note that using attributes_to_hash may lost some data when
> there are two or more values with the same tag name in an
> "attributes" field.
>
> When creating new data, in case using "attributes" extensively,
> GFF3 is recommended, because the design of GFF2 attributes is
> somehow broken.
>
> > Thank you
> >
> >
> > --
> > ---------------
> > Sincerely
> > George
> >
> > Skype: george_g2
> > Blog: http://biorelated.wordpress.com/
>
> Your blog is nice!
>
> --
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>



-- 
---------------
Sincerely
George

Skype: george_g2
Blog: http://biorelated.wordpress.com/



More information about the BioRuby mailing list