[Open-bio-l] Best practice for modelling data in GFF

Dan Bolser dan.bolser at gmail.com
Fri May 28 23:08:50 UTC 2010


Thanks all for replies.

I'm aware of the GFF spec, and the SO ontology terms. The issue here
(as I understand it) is that the feature isn't 'flat', but is a
combination of two matching 'reads' that are grouped into a mate-pair
depending on their proximity and orientation. As pointed out, not
every pair is successfully mapped, specifically one read may be
'missing' from the pair, the pair may span two reference sequences, or
the proximity or orientation of the pair may be incorrect.

Strictly speaking this can be handled by match and match_part (or
read_pair and part_of) terms, however, the question is, does this
reflect the biology adequately? (And specifically which terms should
be used?)

There is a canonical way to model a gene, so I was wondering if it
makes sense to describe similar 'biology' (or in this case molecular
biology) in standard ways (when the feature isn't simply described by
a single line of GFF)?

Perhaps I've not understood SO properly, but I'm not sure how its
structure is translated into GFF structure ... is there a 1 to 1
mapping?


Cheers,
Dan.

On 28 May 2010 18:49, Chris Fields <cjfields at illinois.edu> wrote:
> All,
>
> Appears that link isn't up to date.  Current GFF3 spec (v. 1.16, updated May 25) here:
>
> http://www.sequenceontology.org/gff3.shtml
>
> chris
>
> On May 28, 2010, at 12:06 PM, Jason Stajich wrote:
>
>> It's covered in the GFF3 spec as match_part if that helps.
>> http://song.sourceforge.net/gff3.shtml
>>
>> Dan Bolser wrote, On 5/28/10 9:29 AM:
>>> Hi guys,
>>>
>>> Not sure if this is the right forum, but I just thought I'd ask...
>>>
>>> Where can I find information on 'best practices' for modelling
>>> biological data in GFF?
>>>
>>> For example, I'd like to model paired-end sequence alignments in GFF.
>>> One suggestion was to use match/match_part to link each end into a
>>> pair. Another option is to use 'read_pair' with 'contig' for the
>>> parent feature...
>>>
>>> Should I just be using SAM/BAM?
>>>
>>> Seems a shame not to have a standard way to do this in GFF...
>>>
>>>
>>> Cheers,
>>> Dan.
>>> _______________________________________________
>>> Open-Bio-l mailing list
>>> Open-Bio-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>>>
>> _______________________________________________
>> Open-Bio-l mailing list
>> Open-Bio-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>
>




More information about the Open-Bio-l mailing list