[Biojava-l] GFF & feature creation

Peter Rice pmr@sanger.ac.uk
Tue, 22 Feb 2000 09:04:12 GMT


Matthew,

We are going through feature handling for EMBOSS too. Internally,
we are keeping somethign similar to GFF but it raised some issues.

Proteins we will treat the same as DNA, but ignore the strand and
frame fields in GFF.

Joins across sequences are a problem. For example, the following EMBL
entry where all except one of the the exons (and flanking sequence)
are in separate entries.

ID   AB001103   standard; DNA; HUM; 1329 BP.
XX
AC   AB001103;
XX
SV   AB001103.1
XX
DT   21-AUG-1998 (Rel. 56, Created)
DT   20-JAN-1999 (Rel. 58, Last updated, Version 3)
XX
DE   Homo sapiens gene for H-cadherin, exon 14 and complete cds.
XX
KW   H-cadherin.
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; Eutheria;
OC   Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-1329
RA   Horii A.;
RT   ;
RL   Submitted (18-FEB-1997) to the EMBL/GenBank/DDBJ databases.
RL   Akira Horii, Tohoku University School of Medicine, Department of Molecular
RL   Pathology; 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
RL   (E-mail:horii@mail.cc.tohoku.ac.jp, Tel:81-22-717-8042, Fax:81-22-717-8047)
XX
RN   [2]
RA   Sato M., Mori Y., Sakurada A., Fujimura S., Horii A.;
RT   "The H-cadherin (CDH13) gene is inactivated in human lung cancer";
RL   Hum. Genet. 103:96-101(1998).
XX
DR   SWISS-PROT; P55290; CADD_HUMAN.
XX
CC   Sequence updated (14-Aug-1998)
XX
FH   Key             Location/Qualifiers
FH
FT   CDS             join(AB001090.1:1669..1713,AB001091.1:85..196,
FT                   AB001092.1:40..248,AB001093.1:96..212,AB001094.1:71..223,
FT                   AB001095.1:87..231,AB001096.1:33..211,AB001097.1:35..175,
FT                   AB001098.1:213..395,AB001099.1:56..309,AB001100.1:54..196,
FT                   AB001101.1:171..404,AB001102.1:160..378,210..217)
FT                   /codon_start=1
FT                   /db_xref="SWISS-PROT:P55290"
FT                   /product="H-cadherin"
FT                   /protein_id="BAA32411.1"


-- 
----------------------------------------------------------------------
Peter Rice                | Informatics Division, The Sanger Centre,
E-mail: pmr@sanger.ac.uk  | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967     | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919     | URL: http://www.sanger.ac.uk/Users/pmr/