[SO-devel] Re: [Bioperl-l] GFF3 preliminary

Lincoln Stein lstein at cshl.org
Wed Feb 19 22:50:21 EST 2003


How about mandating a "." in the phase column and insisting that the open 
reading frame model be expressed in terms of a transcript, a set of exons, a 
coding start and a coding end?  This way the phase can be calculated 
correctly while loading the GFF file (by those people who think in terms of 
phase), or even added by a generic GFF processing script.

Lincoln


On Wednesday 19 February 2003 02:49 pm, Ian Korf wrote:
> GFF3 makes good sense to me. I especially like the controlled vocabulary
> and the unambiguous use of whitespace. This should make employment less
> confusing. But the standard doesn't address one of the main problems with
> previous versions, which was that people ignored the standard. I think
> there should be an 'official' validating parser. Probably a Perl script
> with few external dependancies would be most convenient, but a website for
> pasting in files would also be fine. I can imagine that such a program
> could even optimize a GFF file by placing in ### symbols where
> appropriate. Although significantly more work because it would require
> sequence files, being able to make semantic checks would be very useful.
> This is the only way to determine if a CDS has no stop codons or that a
> phase 0 exon is really phase 0.
>
> Historically, the definition for phase has been well-defined but usually
> not followed. One person's frame is another person's phase and in the end
> people just give up and put a "." there. The current definition does not
> resolve this problem, and people will continue to be mystified.
>
> The problem may be that the definition is not strand-symmetric. A phase 1
> exon on the plus strand indicates that the 5' end has an impartial codon,
> but says nothing about the 3' end. On the minus strand, a phase 1 exon
> indicates that the 3' end is impartial but says nothing about the 5' end.
> The opposite end and the frame can easily be worked out from the
> coordinates. But maybe it would be better to make phase and frame explict.
> How about representing them as a triplet of phase;frame;phase. This may be
> redundant, but it's clearer. For example, "0;1;1" indicates that the first
> base is a the start of a complete codon, it is in frame 1, and there is 1
> base of a partial codon at the end. Here, start and end refer to the
> absolute coordinates, but I'd be just as happy to make them 5' and 3'.
>
> -Ian
>
>
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
> The most comprehensive and flexible code editor you can use.
> Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
> www.slickedit.com/sourceforge
> _______________________________________________
> SOng-devel mailing list
> SOng-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/song-devel

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
	1 Bungtown Road, Cold Spring Harbor, NY 11724
========================================================================



More information about the Bioperl-l mailing list