[Bioperl-l] GFF3 preliminary

Ian Korf ik1 at sanger.ac.uk
Wed Feb 19 06:49:44 EST 2003


GFF3 makes good sense to me. I especially like the controlled vocabulary
and the unambiguous use of whitespace. This should make employment less
confusing. But the standard doesn't address one of the main problems with
previous versions, which was that people ignored the standard. I think
there should be an 'official' validating parser. Probably a Perl script
with few external dependancies would be most convenient, but a website for
pasting in files would also be fine. I can imagine that such a program
could even optimize a GFF file by placing in ### symbols where
appropriate. Although significantly more work because it would require
sequence files, being able to make semantic checks would be very useful.
This is the only way to determine if a CDS has no stop codons or that a
phase 0 exon is really phase 0.

Historically, the definition for phase has been well-defined but usually
not followed. One person's frame is another person's phase and in the end
people just give up and put a "." there. The current definition does not
resolve this problem, and people will continue to be mystified.

The problem may be that the definition is not strand-symmetric. A phase 1
exon on the plus strand indicates that the 5' end has an impartial codon,
but says nothing about the 3' end. On the minus strand, a phase 1 exon
indicates that the 3' end is impartial but says nothing about the 5' end.
The opposite end and the frame can easily be worked out from the
coordinates. But maybe it would be better to make phase and frame explict.
How about representing them as a triplet of phase;frame;phase. This may be
redundant, but it's clearer. For example, "0;1;1" indicates that the first
base is a the start of a complete codon, it is in frame 1, and there is 1
base of a partial codon at the end. Here, start and end refer to the
absolute coordinates, but I'd be just as happy to make them 5' and 3'.

-Ian








More information about the Bioperl-l mailing list