[Open-bio-l] Best practice for modelling data in GFF

Dan Bolser dan.bolser at gmail.com
Thu Jul 1 09:48:02 UTC 2010


On 1 June 2010 12:34, Brad Chapman <chapmanb at 50mail.com> wrote:
> Dan;
> If what you are trying to do is represent your data in a way that the
> most people can parse and reuse it, my suggestion would be to use
> SAM/BAM to represent your alignments. You'll be using a standardized and
> well-supported format specifically designed for this type of data.
>
> While you can do this with GFF, the parser support for correctly
> dealing with match_part or part_of is likely to be less robust.
> As data providers standardize on one way to represent nested
> features, it should become easier to deal with them.

Yeah, I think BAM is a good way to go in this instance.


> Brad
>
>> Thanks all for replies.
>>
>> I'm aware of the GFF spec, and the SO ontology terms. The issue here
>> (as I understand it) is that the feature isn't 'flat', but is a
>> combination of two matching 'reads' that are grouped into a mate-pair
>> depending on their proximity and orientation. As pointed out, not
>> every pair is successfully mapped, specifically one read may be
>> 'missing' from the pair, the pair may span two reference sequences, or
>> the proximity or orientation of the pair may be incorrect.
>>
>> Strictly speaking this can be handled by match and match_part (or
>> read_pair and part_of) terms, however, the question is, does this
>> reflect the biology adequately? (And specifically which terms should
>> be used?)
>>
>> There is a canonical way to model a gene, so I was wondering if it
>> makes sense to describe similar 'biology' (or in this case molecular
>> biology) in standard ways (when the feature isn't simply described by
>> a single line of GFF)?
>>
>> Perhaps I've not understood SO properly, but I'm not sure how its
>> structure is translated into GFF structure ... is there a 1 to 1
>> mapping?
>>
>>
>> Cheers,
>> Dan.
>>
>> On 28 May 2010 18:49, Chris Fields <cjfields at illinois.edu> wrote:
>> > All,
>> >
>> > Appears that link isn't up to date.  Current GFF3 spec (v. 1.16, updated May 25) here:
>> >
>> > http://www.sequenceontology.org/gff3.shtml
>> >
>> > chris
>> >
>> > On May 28, 2010, at 12:06 PM, Jason Stajich wrote:
>> >
>> >> It's covered in the GFF3 spec as match_part if that helps.
>> >> http://song.sourceforge.net/gff3.shtml
>> >>
>> >> Dan Bolser wrote, On 5/28/10 9:29 AM:
>> >>> Hi guys,
>> >>>
>> >>> Not sure if this is the right forum, but I just thought I'd ask...
>> >>>
>> >>> Where can I find information on 'best practices' for modelling
>> >>> biological data in GFF?
>> >>>
>> >>> For example, I'd like to model paired-end sequence alignments in GFF.
>> >>> One suggestion was to use match/match_part to link each end into a
>> >>> pair. Another option is to use 'read_pair' with 'contig' for the
>> >>> parent feature...
>> >>>
>> >>> Should I just be using SAM/BAM?
>> >>>
>> >>> Seems a shame not to have a standard way to do this in GFF...
>> >>>
>> >>>
>> >>> Cheers,
>> >>> Dan.
>> >>> _______________________________________________
>> >>> Open-Bio-l mailing list
>> >>> Open-Bio-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> >>>
>> >> _______________________________________________
>> >> Open-Bio-l mailing list
>> >> Open-Bio-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> >
>> >
>>
>> _______________________________________________
>> Open-Bio-l mailing list
>> Open-Bio-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>




More information about the Open-Bio-l mailing list