[emboss-dev] Problems with EMBOSS seqret GenBank to GFF3

Peter Rice pmr at ebi.ac.uk
Wed Aug 24 14:45:33 UTC 2011


On 08/24/2011 11:36 AM, Peter Rice wrote:
> On 08/17/2011 11:37 AM, Peter Cock wrote:
>
>> ------------------------------------------
>>
>> Problem Seven - No parent/child relationships
>>
>> The EMBOSS 6.4.0 GFF3 file does use parent/child relationships
>> but not in the way I expected (and not in a way the validator likes).

As a first attempt, using the EMBL entry v00508 in the EMBOSS test set, 
I can make the CDS "parent" feature change its type to 
"biological_region" and add a featflags tag with the true type. Code 
(not yet checked in) can reconstruct the EMBL feature table from this GFF.

However, the EMBL tags are all on the parent (now biological_region) 
feature.

Any suggestions where I should stick them for them to be useful in GFF3?

EMBL feature table:

FT   source          1..3919
FT                   /organism="Homo sapiens"
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon:9606"
FT   CDS             join(2079..2171,2294..2515,3371..3499)
FT                   /db_xref="GDB:119299"
FT                   /db_xref="GOA:P02100"
FT                   /db_xref="HGNC:4830"
FT                   /db_xref="InterPro:IPR000971"
FT                   /db_xref="InterPro:IPR002337"
FT                   /db_xref="InterPro:IPR009050"
FT                   /db_xref="InterPro:IPR012292"
FT                   /db_xref="PDB:1A9W"
FT                   /db_xref="UniProtKB/Swiss-Prot:P02100"
FT                   /protein_id="CAA23766.1"
FT 
/translation="MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDS
FT 
FGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENF
FT                   KLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH"

proposed GFF3 version

V00508	EMBL	databank_entry	1	3919	.	+	.	ID=V00508.1;organism=Homo 
sapiens;mol_type=genomic DNA;db_xref=taxon:9606
V00508	EMBL	biological_region	2079	3499	.	+	0 
ID=V00508.2;featflags=type:CDS;db_xref=GDB:119299;db_xref=GOA:P02100;db_xref=HGNC:4830;db_xref=InterPro:IPR000971;db_x
ref=InterPro:IPR002337;db_xref=InterPro:IPR009050;db_xref=InterPro:IPR012292;db_xref=PDB:1A9W;db_xref=UniProtKB/Swiss-Prot:P02100;protein_id=CAA23766.1;translation=MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLV
VYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH
V00508	EMBL	CDS	2079	2171	.	+	0	Parent=V00508.2
V00508	EMBL	CDS	2294	2515	.	+	0	Parent=V00508.2
V00508	EMBL	CDS	3371	3499	.	+	0	Parent=V00508.2



regards,

Peter Rice
EMBOSS Team



More information about the emboss-dev mailing list