[Open-bio-l] Best practice for modelling data in GFF

Leighton Pritchard lpritc at scri.ac.uk
Fri May 28 16:59:13 UTC 2010


Hi Dan,

On 28/05/2010 Friday, May 28, 17:29, "Dan Bolser" <dan.bolser at gmail.com>
wrote:

> Not sure if this is the right forum, but I just thought I'd ask...
> 
> Where can I find information on 'best practices' for modelling
> biological data in GFF?

The specification is a good place to start:

http://www.sequenceontology.org/gff3.shtml

> For example, I'd like to model paired-end sequence alignments in GFF.
> One suggestion was to use match/match_part to link each end into a
> pair. Another option is to use 'read_pair' with 'contig' for the
> parent feature...

I'm not sure it's an issue with GFF as much as it is just working out where
your data fits in the Sequence Ontology model.

If your read pairs have been used to assemble the larger contig sequence
that you're modelling them as part_of, then read_pair would seem to be
exactly what you're looking for:

http://www.sequenceontology.org/miso/current_release/term/SO:0000007

However, if your read pair comes from a different contig, or exists in some
abstract sense, not associated with the assembly of the contig, and you're
just *aligning them to another sequence*, then a match, with (at least) two
match_part children corresponding to the regions that each read matches
could be more appropriate.

Which of these options best reflects your data?

Cheers,

L.



-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________



More information about the Open-Bio-l mailing list