[Bioperl-l] new GFF2 parsing/dumping routines committed...

Mark Wilkinson mwilkinson@gene.pbi.nrc.ca
Mon, 20 Nov 2000 17:03:01 -0600


Hi Group!

the last two commits from me allow the import/export of (as far as I understand) properly formatted GFF2.

to create generic features from GFF2 you would make the call as follows:

$Feature = new SeqFeature::Generic (-gff2_string => $string);

The most important differences between the original -gff_string and the new -gff2_string options are as follows

(1) the fields *must be TAB-separated* (formerly it was splitting on whitespace, but that would choke on the freetext that is now allowed)
(2) there is no default "group" tag created.  You must specify   group=MyGroup   in the attributes field
(3) tag/value units are semicolon separated
(4) tags can have more than one space-separated value
(5) free-text is allowed as a value so long as it is double-quoted.
(6) comments are allowed but are ignored (comments are at the end of the GFF line preceeded by a # symbol)

and example of a GFF string that could be parsed by this routine would be:

mysequence    GMHMM    exon    100    200    45    .    .    group=MyFavGene;notes="the answer"   "to LtUandE is"   42   # these are comments

this results in a feature with the following structure:

0  Bio::SeqFeature::Generic=HASH(0x844db70)
   '_gsf_end' => 200
   '_gsf_score' => 45
   '_gsf_seqname' => 'abc'
   '_gsf_start' => 100
   '_gsf_strand' => 0
   '_gsf_sub_array' => ARRAY(0x84507e8)
        empty array
   '_gsf_tag_hash' => HASH(0x845074c)
      'group' => ARRAY(0x845116c)
         0  'MyFavGene'
      'notes' => ARRAY(0x845122c)
         0  'the answer'
         1  'to LtUandE is'
         2  '42'
   '_parse_h' => HASH(0x8437dfc)
        empty hash
   '_primary_tag' => 'exon'
   '_record_err' => undef
   '_source_tag' => 'GMHMM'
   '_strict' => undef
   '_verbose' => undef


If you are so inclined please give this a thorough working over and let me know if you find errors.  So far it seems to be okay... touch wood!

Cheers all!

Mark

--
---
Dr. Mark Wilkinson
Bioinformatics Group
National Research Council of Canada
Plant Biotechnology Institute
110 Gymnasium Place
Saskatoon, SK
Canada