[DAS] GFF version in Spec
Eric Pelletier
ericp@genoscope.cns.fr
Wed, 9 Oct 2002 17:58:32 +0200 (MET DST)
On Wed, 9 Oct 2002, Francis Ouellette wrote:
|We've come accross this GFF duality (tria-reality?) problem, and this
|has basically made us shy away from using GFF. There is another
|"stable" formats (table wise) that is documented and that will serve
|our purpouse, like that used by Sequin:
|
|http://www.ncbi.nlm.nih.gov/Sequin/table.html
|
|which is what we are using:
|
|Column 1: Start location of feature
|Column 2: Stop location of feature
|Column 3: Feature key
|Column 4: Qualifier key
|Column 5: Qualifier value
|
|and this is based on the feature table docs supported by
|DDBJ/EMBL/GenBank which basically supports all we know, and use.
|
Another option I would suggest is the FFF format as proposed by
the latest versions of ReadSeq :
extract from the ReadSeq2 doc :
---
FFF - Flattened Feature Format (FlatFeat)
This is essentially the DDBJ/GenBank/EMBL Feature Table
specification with newlines and extra space removed, to produce
single-line feature entries which can more efficiently be indexed
and read by software. Basic FFF format is a tab-separated-value
file, with these columns: Key Location Qualifiers
# flatfeat-version 2
Key Location Qualifiers
source 1..1684 /organism="Drosophila melanogaster" ;/chromosome="3L" ; /map="69C" ; /ACCESSION="U57488" ...
gene 18..1684 /gene="est-6"
CDS join(18..1404,1456..>1684) /gene="est-6" ;...
exon 18..1404 /gene="est-6"
intron 1405..1455 /gene="est-6"
exon 1456..>1684 /gene="est-6"
Similar to GFF, FFF works better for me as it contains all
information about a feature in one row. GFF splits gene, mRNA,
other multi-location features among several rows, and currently
lacks a standard syntax for grouping these feature parts.
---
--
Eric Pelletier, PhD | Genoscope - Centre National de Séquençage
Service Informatique | CNRS UMR-8030
Tel: (33) 0 160 872 519 | CP 5706 91057 Evry cedex - France