[Bioperl-l] Converting GFF2 records to GFF3

Razi Khaja razi at genet.sickkids.on.ca
Thu Dec 23 15:54:40 EST 2004


Sorry for cross posting, but this may be relevent to both bioperl and song-devel.
 
Ive written a small script to convert gff2 records to gff3 using bioperl and vice versa (see gff2_to_gff3.pl and gff3_to_gff2.pl below).  
 
In doing this I have noticed some problems in conversion.
 
The method Bio::Tools::GFF::_gff3_string will quote attribute values if they contain characters not in [a-zA-Z0-9,;=.:%^*$@!+_?-] (ie. $value = '"'.$value.'"';) and will output empty quotes for tags without values (ie. $value = "\"\"";).
 
Currently the gff3 spec says: "Unescaped quotation marks, ... are explicitly forbidden." 
 
This brings up 2 questions:
(1) Are quotes necessary in gff3?
(2) When a value is empty, what should be output?
    a) Tag="";
    b) Tag=.;
    c) Tag=;
    d) nothing?
 
(Apart from not meeting the spec, this makes it difficult to do transformations from gff2 to gff3 and back to gff2 again.)

 
 
 
# =====  gff2_to_gff3.pl =====
#!/usr/bin/perl
use strict;
use Bio::Tools::GFF;
my( $gff2File ) = @ARGV;
my $gffio = Bio::Tools::GFF->new(-file=>"$gff2File", 
-gff_version=>2);
while( my $feature = $gffio->next_feature() ) {
    my $gff3string = $gffio->_gff3_string( $feature );
    print "$gff3string\n";
}
$gffio->close();

 
 
# =====  gff3_to_gff2.pl =====

#!/usr/bin/perl
use strict;
use Bio::Tools::GFF;
my( $gff3File ) = @ARGV;
my $gffio = Bio::Tools::GFF->new(-file=>"$gff3File", -gff_version=>3);
while( my $feature = $gffio->next_feature() ) {
    my $gff2string = $gffio->_gff2_string( $feature );
    print "$gff2string\n";
}
$gffio->close();

 



/**
 * Razi Khaja, Bioinformatics Analyst
 * The Hospital for Sick Children, Toronto
 */


More information about the Bioperl-l mailing list