[Bioperl-l] GFF3 preliminary

Lincoln Stein lstein at cshl.org
Mon Feb 24 13:52:41 EST 2003


Chris Mungall has written a validating GFF3 parser.  His first act was to find 
that the example in the GFF3 spec did not validate against the Sequence 
Ontology!

We'll make this available on the SO site soon (song.sourceforge.net)

Lincoln

On Wednesday 19 February 2003 03:16 am, Jim Kent wrote:
> A validator is helpful.  The GTF validator would have been nice
> a few years back.  When it came online it enforced the author's
> interpretation of GTF, which on the subject of whether a stop
> codon was part of the coding region among other things, was in
> variance with all the other GTF we'd seen hitherto.  Having a GFF 3
> validator sooner rather than later is good.
>
> Who is doing the GTF validator anyway?  It might be the easiest
> place to start for the GFF 3 one.
>
>
> ----- Original Message -----
> From: "Ian Korf" <ik1 at sanger.ac.uk>
> To: "Lincoln Stein" <lstein at cshl.org>
> Cc: <gff-list at sanger.ac.uk>; <song-devel at lists.sourceforge.net>;
> <michele at sanger.ac.uk>; <bioperl-l at bioperl.org>
> Sent: Tuesday, February 18, 2003 10:49 PM
> Subject: Re: [Bioperl-l] GFF3 preliminary
>
> > GFF3 makes good sense to me. I especially like the controlled vocabulary
> > and the unambiguous use of whitespace. This should make employment less
> > confusing. But the standard doesn't address one of the main problems with
> > previous versions, which was that people ignored the standard. I think
> > there should be an 'official' validating parser. Probably a Perl script
> > with few external dependancies would be most convenient, but a website
> > for pasting in files would also be fine. I can imagine that such a
> > program could even optimize a GFF file by placing in ### symbols where
> > appropriate. Although significantly more work because it would require
> > sequence files, being able to make semantic checks would be very useful.
> > This is the only way to determine if a CDS has no stop codons or that a
> > phase 0 exon is really phase 0.
> >
> > Historically, the definition for phase has been well-defined but usually
> > not followed. One person's frame is another person's phase and in the end
> > people just give up and put a "." there. The current definition does not
> > resolve this problem, and people will continue to be mystified.
> >
> > The problem may be that the definition is not strand-symmetric. A phase 1
> > exon on the plus strand indicates that the 5' end has an impartial codon,
> > but says nothing about the 3' end. On the minus strand, a phase 1 exon
> > indicates that the 3' end is impartial but says nothing about the 5' end.
> > The opposite end and the frame can easily be worked out from the
> > coordinates. But maybe it would be better to make phase and frame
> > explict. How about representing them as a triplet of phase;frame;phase.
> > This may be redundant, but it's clearer. For example, "0;1;1" indicates
> > that the first base is a the start of a complete codon, it is in frame 1,
> > and there is 1 base of a partial codon at the end. Here, start and end
> > refer to the absolute coordinates, but I'd be just as happy to make them
> > 5' and 3'.
> >
> > -Ian

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list