[Bioperl-l] GFF3 preliminary

Jim Kent jim_kent at pacbell.net
Wed Feb 19 00:16:11 EST 2003


A validator is helpful.  The GTF validator would have been nice
a few years back.  When it came online it enforced the author's
interpretation of GTF, which on the subject of whether a stop
codon was part of the coding region among other things, was in
variance with all the other GTF we'd seen hitherto.  Having a GFF 3
validator sooner rather than later is good.

Who is doing the GTF validator anyway?  It might be the easiest
place to start for the GFF 3 one.


----- Original Message -----
From: "Ian Korf" <ik1 at sanger.ac.uk>
To: "Lincoln Stein" <lstein at cshl.org>
Cc: <gff-list at sanger.ac.uk>; <song-devel at lists.sourceforge.net>;
<michele at sanger.ac.uk>; <bioperl-l at bioperl.org>
Sent: Tuesday, February 18, 2003 10:49 PM
Subject: Re: [Bioperl-l] GFF3 preliminary


> GFF3 makes good sense to me. I especially like the controlled vocabulary
> and the unambiguous use of whitespace. This should make employment less
> confusing. But the standard doesn't address one of the main problems with
> previous versions, which was that people ignored the standard. I think
> there should be an 'official' validating parser. Probably a Perl script
> with few external dependancies would be most convenient, but a website for
> pasting in files would also be fine. I can imagine that such a program
> could even optimize a GFF file by placing in ### symbols where
> appropriate. Although significantly more work because it would require
> sequence files, being able to make semantic checks would be very useful.
> This is the only way to determine if a CDS has no stop codons or that a
> phase 0 exon is really phase 0.
>
> Historically, the definition for phase has been well-defined but usually
> not followed. One person's frame is another person's phase and in the end
> people just give up and put a "." there. The current definition does not
> resolve this problem, and people will continue to be mystified.
>
> The problem may be that the definition is not strand-symmetric. A phase 1
> exon on the plus strand indicates that the 5' end has an impartial codon,
> but says nothing about the 3' end. On the minus strand, a phase 1 exon
> indicates that the 3' end is impartial but says nothing about the 5' end.
> The opposite end and the frame can easily be worked out from the
> coordinates. But maybe it would be better to make phase and frame explict.
> How about representing them as a triplet of phase;frame;phase. This may be
> redundant, but it's clearer. For example, "0;1;1" indicates that the first
> base is a the start of a complete codon, it is in frame 1, and there is 1
> base of a partial codon at the end. Here, start and end refer to the
> absolute coordinates, but I'd be just as happy to make them 5' and 3'.
>
> -Ian
>
>
>
>
>
>



More information about the Bioperl-l mailing list