[Open-bio-l] Best practice for modelling data in GFF

Dan Bolser dan.bolser at gmail.com
Thu Jul 1 10:12:21 UTC 2010


On 29 May 2010 00:08, Dan Bolser <dan.bolser at gmail.com> wrote:
> Thanks all for replies.

<snip>

> There is a canonical way to model a gene, so I was wondering if it
> makes sense to describe similar 'biology' (or in this case molecular
> biology) in standard ways (when the feature isn't simply described by
> a single line of GFF)?
>
> Perhaps I've not understood SO properly, but I'm not sure how its
> structure is translated into GFF structure ... is there a 1 to 1
> mapping?

Lack of replies lead me to believe that indeed, the GFF Parent
attribute should reflect (or be strictly determined by) the SO
'relationships' (are they all 'part_of' relationships?)

However, I was trying to get some concepts clear in my head, and I
ended up creating a figure of a 'canonical gene' in SO [1], based on
the one in the GFF docs [2].

[1] http://imagebin.ca/view/Ni9BFbK.html
[2] http://www.sequenceontology.org/gff3.shtml


There is a transitive part_of relationships between 'mRNA' and 'gene',
which explains line 4 to 6 of the canonical gene GFF [2].

However, the figure shows that 'exon' is part_of 'transcript', and not
part_of 'mRNA'. If I got the figure right, and if I understand
correctly, there is no way to transitively infer that exon is part_of
mRNA (line 7 to 11 of the GFF [2]).

This implies that the 'structure' in GFF isn't strictly determined by SO.

Or is it a mistake in SO?


Sorry if this is a 'gotcha' that has been discussed before. Any links
to help me understand would be great.

Dan.

> Cheers,
> Dan.
>
> On 28 May 2010 18:49, Chris Fields <cjfields at illinois.edu> wrote:
>> All,
>>
>> Appears that link isn't up to date.  Current GFF3 spec (v. 1.16, updated May 25) here:
>>
>> http://www.sequenceontology.org/gff3.shtml
>>
>> chris
>>
>> On May 28, 2010, at 12:06 PM, Jason Stajich wrote:
>>
>>> It's covered in the GFF3 spec as match_part if that helps.
>>> http://song.sourceforge.net/gff3.shtml
>>>
>>> Dan Bolser wrote, On 5/28/10 9:29 AM:
>>>> Hi guys,
>>>>
>>>> Not sure if this is the right forum, but I just thought I'd ask...
>>>>
>>>> Where can I find information on 'best practices' for modelling
>>>> biological data in GFF?
>>>>
>>>> For example, I'd like to model paired-end sequence alignments in GFF.
>>>> One suggestion was to use match/match_part to link each end into a
>>>> pair. Another option is to use 'read_pair' with 'contig' for the
>>>> parent feature...
>>>>
>>>> Should I just be using SAM/BAM?
>>>>
>>>> Seems a shame not to have a standard way to do this in GFF...
>>>>
>>>>
>>>> Cheers,
>>>> Dan.
>>>> _______________________________________________
>>>> Open-Bio-l mailing list
>>>> Open-Bio-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>>>>
>>> _______________________________________________
>>> Open-Bio-l mailing list
>>> Open-Bio-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>>
>>
>




More information about the Open-Bio-l mailing list