[DAS] GFF version in Spec

Eric Pelletier ericp@genoscope.cns.fr
Wed, 9 Oct 2002 17:58:32 +0200 (MET DST)


On Wed, 9 Oct 2002, Francis Ouellette wrote:


|We've come accross this GFF duality (tria-reality?) problem, and this
|has basically made us shy away from using GFF. There is another
|"stable" formats (table wise) that is documented and that will serve
|our purpouse, like that used by Sequin:
|
|http://www.ncbi.nlm.nih.gov/Sequin/table.html
|
|which is what we are using:
|
|Column 1: Start location of feature
|Column 2: Stop location of feature
|Column 3: Feature key
|Column 4: Qualifier key
|Column 5: Qualifier value
|
|and this is based on the feature table docs supported by
|DDBJ/EMBL/GenBank which basically supports all we know, and use.
|



Another option I would suggest is the FFF format as proposed by
the latest versions of ReadSeq :

extract from the ReadSeq2 doc :
---

	FFF - Flattened Feature Format (FlatFeat)
This is essentially the DDBJ/GenBank/EMBL Feature Table
specification with newlines and extra space removed, to produce
single-line feature entries which can more efficiently be indexed
and read by software. Basic FFF format is a tab-separated-value
file, with these columns: Key Location Qualifiers

# flatfeat-version 2
    Key Location  Qualifiers
    source  1..1684   /organism="Drosophila melanogaster" ;/chromosome="3L" ; /map="69C" ; /ACCESSION="U57488" ...
    gene  18..1684  /gene="est-6"
    CDS join(18..1404,1456..>1684)  /gene="est-6" ;...
    exon  18..1404  /gene="est-6"
    intron  1405..1455  /gene="est-6"
    exon  1456..>1684 /gene="est-6"


    Similar to GFF, FFF works better for me as it contains all
information about a feature in one row. GFF splits gene, mRNA,
other multi-location features among several rows, and currently
lacks a standard syntax for grouping these feature parts.

---





-- 
Eric Pelletier, PhD       |   Genoscope - Centre National de Séquençage
Service Informatique      |   CNRS UMR-8030
Tel: (33) 0 160 872 519   |   CP 5706 91057 Evry cedex - France