[emboss-dev] Problems with EMBOSS seqret GenBank to GFF3
Peter Rice
pmr at ebi.ac.uk
Wed Aug 24 14:45:33 UTC 2011
On 08/24/2011 11:36 AM, Peter Rice wrote:
> On 08/17/2011 11:37 AM, Peter Cock wrote:
>
>> ------------------------------------------
>>
>> Problem Seven - No parent/child relationships
>>
>> The EMBOSS 6.4.0 GFF3 file does use parent/child relationships
>> but not in the way I expected (and not in a way the validator likes).
As a first attempt, using the EMBL entry v00508 in the EMBOSS test set,
I can make the CDS "parent" feature change its type to
"biological_region" and add a featflags tag with the true type. Code
(not yet checked in) can reconstruct the EMBL feature table from this GFF.
However, the EMBL tags are all on the parent (now biological_region)
feature.
Any suggestions where I should stick them for them to be useful in GFF3?
EMBL feature table:
FT source 1..3919
FT /organism="Homo sapiens"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:9606"
FT CDS join(2079..2171,2294..2515,3371..3499)
FT /db_xref="GDB:119299"
FT /db_xref="GOA:P02100"
FT /db_xref="HGNC:4830"
FT /db_xref="InterPro:IPR000971"
FT /db_xref="InterPro:IPR002337"
FT /db_xref="InterPro:IPR009050"
FT /db_xref="InterPro:IPR012292"
FT /db_xref="PDB:1A9W"
FT /db_xref="UniProtKB/Swiss-Prot:P02100"
FT /protein_id="CAA23766.1"
FT
/translation="MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDS
FT
FGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENF
FT KLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH"
proposed GFF3 version
V00508 EMBL databank_entry 1 3919 . + . ID=V00508.1;organism=Homo
sapiens;mol_type=genomic DNA;db_xref=taxon:9606
V00508 EMBL biological_region 2079 3499 . + 0
ID=V00508.2;featflags=type:CDS;db_xref=GDB:119299;db_xref=GOA:P02100;db_xref=HGNC:4830;db_xref=InterPro:IPR000971;db_x
ref=InterPro:IPR002337;db_xref=InterPro:IPR009050;db_xref=InterPro:IPR012292;db_xref=PDB:1A9W;db_xref=UniProtKB/Swiss-Prot:P02100;protein_id=CAA23766.1;translation=MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLV
VYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH
V00508 EMBL CDS 2079 2171 . + 0 Parent=V00508.2
V00508 EMBL CDS 2294 2515 . + 0 Parent=V00508.2
V00508 EMBL CDS 3371 3499 . + 0 Parent=V00508.2
regards,
Peter Rice
EMBOSS Team
More information about the emboss-dev
mailing list