[Open-bio-l] Best practice for modelling data in GFF

Dan Bolser dan.bolser at gmail.com
Tue Jul 6 12:11:54 UTC 2010


On 6 July 2010 11:58, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> Hi Dan,
>
> GFF3 is just a file format, capable of representing the SO's hierarchical
> subfeatures.  You can represent other things (including other ontologies) in
> the same format.  How strictly you choose to stick to the SO's hierarchy is
> up to you,

<snip>

OK, I understand. So the answer to my original question "Where can I
find information on 'best practices' for modelling biological data in
GFF?" was just, "work out where your data fits in the Sequence
Ontology model".

:-)


> So yes, if you want to build a SO-compatible gene model, you had better make
> sure the parent-child relationships correspond to the hierarchy in the SO.
> This is true whether you want to represent the model in GFF3 or not.

OK


> Now, for your specific question about the exon/mRNA terms: an exon is_a
> transcript_region, and a transcript_region is part_of a transcript.  [And a
> transcript is_a gene_member_region, and a gene_member_region is a member_of
> a gene.]
>
> Now, an mRNA is_a mature_transcript, which is_a transcript.  The exon that
> is part_of a transcript can therefore be part_of an mRNA, because an mRNA is
> a transcript.

I see... I guess this only applies because of the specific 'is_a'
relationship. i.e. just because a nose is part_of a face and a mouth
is part_of a face, you can't make inferences about relationships
between nose and mouth.

I guess I'm confused because I can't see the link types in miso.


> So in the model at http://www.sequenceontology.org/gff3.shtml the transcript
> you're looking for is the mRNA.  The same would be true if the parent
> feature was a monocistronic_mRNA, which is_a mRNA, and also is_a
> monocistronic_transcript, which is_a transcript.
>
> Have you had a look at OBO-Edit?  It's a useful learning tool for getting
> your head around these things, and you can browse through the SO in it.

I'll have a look.


Thanks very much for taking the time to provide such a detailed reply,
and sorry for the dumb questions.

All the best,
Dan.


> Cheers,
>
> L.
>
>
> On 06/07/2010 Tuesday, July 6, 11:10, "Dan Bolser" <dan.bolser at gmail.com>
> wrote:
>
>> When you don't get a reply, you never know if your question was too
>> dumb, too smart, or totally off topic.
>>
>> Any hints?
>>
>> Cheers,
>> Dan.
>>
>> On 1 July 2010 11:12, Dan Bolser <dan.bolser at gmail.com> wrote:
>>> On 29 May 2010 00:08, Dan Bolser <dan.bolser at gmail.com> wrote:
>>>> Thanks all for replies.
>>>
>>> <snip>
>>>
>>>> There is a canonical way to model a gene, so I was wondering if it
>>>> makes sense to describe similar 'biology' (or in this case molecular
>>>> biology) in standard ways (when the feature isn't simply described by
>>>> a single line of GFF)?
>>>>
>>>> Perhaps I've not understood SO properly, but I'm not sure how its
>>>> structure is translated into GFF structure ... is there a 1 to 1
>>>> mapping?
>>>
>>> Lack of replies lead me to believe that indeed, the GFF Parent
>>> attribute should reflect (or be strictly determined by) the SO
>>> 'relationships' (are they all 'part_of' relationships?)
>>>
>>> However, I was trying to get some concepts clear in my head, and I
>>> ended up creating a figure of a 'canonical gene' in SO [1], based on
>>> the one in the GFF docs [2].
>>>
>>> [1] http://imagebin.ca/view/Ni9BFbK.html
>>> [2] http://www.sequenceontology.org/gff3.shtml
>>>
>>>
>>> There is a transitive part_of relationships between 'mRNA' and 'gene',
>>> which explains line 4 to 6 of the canonical gene GFF [2].
>>>
>>> However, the figure shows that 'exon' is part_of 'transcript', and not
>>> part_of 'mRNA'. If I got the figure right, and if I understand
>>> correctly, there is no way to transitively infer that exon is part_of
>>> mRNA (line 7 to 11 of the GFF [2]).
>>>
>>> This implies that the 'structure' in GFF isn't strictly determined by SO.
>>>
>>> Or is it a mistake in SO?
>>>
>>>
>>> Sorry if this is a 'gotcha' that has been discussed before. Any links
>>> to help me understand would be great.
>>>
>>> Dan.
>> _______________________________________________
>> Open-Bio-l mailing list
>> Open-Bio-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>
> --
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
>
>
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.
> The Scottish Crop Research Institute is a charitable company limited by guarantee.
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
>
>
> DISCLAIMER:
>
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
>
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
>




More information about the Open-Bio-l mailing list