[SO-devel] Re: [Bioperl-l] GFF3 preliminary

Richard Durbin rd at sanger.ac.uk
Thu Feb 20 11:11:15 EST 2003


This is specifically about phase.  I have other comments on the full 
document, which is approaching convergence I think.  Sorry for the delay 
in responding.

I can't understand why the definition of phase in the current spec is 
unclear.  It says:

   <frame>  One of '0', '1', '2' or '.'. '0' indicates that the specified
	   region is in frame, i.e. that its first base corresponds to
	   the first base of a codon. '1' indicates that there is one
	   extra base, i.e. that the second base of the region
	   corresponds to the first base of a codon, and '2' means that
	   the third base of the region is the first base of a codon. If
	   the strand is '-', then the first base of the region is value
	   of <end>, because the corresponding coding region will run
	   from <end> to <start> on the reverse strand. As with
	   <strand>, if the frame is not relevant then set <frame> to
	   '.'. It has been pointed out that "phase" might be a better
	   descriptor than "frame" for this field. Version 2 change:
	   This field is left empty '.' for RNA and protein features.

(Yes, I know we called it frame when it should be phase.  I completely 
support GFF3 calling this field "phase".)

Anyway, we (primarily David Haussler and I) specifically addressed the 
reverse strand, and it is the opposite of what Ian says.  The phase is 
strand symmetric - it is always about the 5' end.  I guess we should 
have said it that way.  Anyway, please feel free to rewrite.  I think
it is very important to keep the phase column.  It is relevant for
similarities and partial genewise matches etc. as well as full coding 
sequences.  I don't support the view that the phase of a GFF line should 
be calculated implicitly from the presence of coding_start features in 
other lines that may or may not be properly linked with this one.  Lines 
should be as independent as possible.

So I strongly disgaree with Lincoln's suggestion.  GTF as formalised by 
Michael Brent et al. only got away with forgetting because it was used 
for a restricted set of purposes.

Richard

Lincoln Stein wrote:
> How about mandating a "." in the phase column and insisting that the open 
> reading frame model be expressed in terms of a transcript, a set of exons, a 
> coding start and a coding end?  This way the phase can be calculated 
> correctly while loading the GFF file (by those people who think in terms of 
> phase), or even added by a generic GFF processing script.
> 
> Lincoln
> 
> 
> On Wednesday 19 February 2003 02:49 pm, Ian Korf wrote:
> 
>>Historically, the definition for phase has been well-defined but usually
>>not followed. One person's frame is another person's phase and in the end
>>people just give up and put a "." there. The current definition does not
>>resolve this problem, and people will continue to be mystified.
>>
>>The problem may be that the definition is not strand-symmetric. A phase 1
>>exon on the plus strand indicates that the 5' end has an impartial codon,
>>but says nothing about the 3' end. On the minus strand, a phase 1 exon
>>indicates that the 3' end is impartial but says nothing about the 5' end.
>>The opposite end and the frame can easily be worked out from the
>>coordinates. But maybe it would be better to make phase and frame explict.
>>How about representing them as a triplet of phase;frame;phase. This may be
>>redundant, but it's clearer. For example, "0;1;1" indicates that the first
>>base is a the start of a complete codon, it is in frame 1, and there is 1
>>base of a partial codon at the end. Here, start and end refer to the
>>absolute coordinates, but I'd be just as happy to make them 5' and 3'.
>>
>>-Ian
>>
>>
>>
>>
>>
>>
>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
>>The most comprehensive and flexible code editor you can use.
>>Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
>>www.slickedit.com/sourceforge
>>_______________________________________________
>>SOng-devel mailing list
>>SOng-devel at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/song-devel
> 
> 




More information about the Bioperl-l mailing list