loading DDBJ data into EMBOSS

Peter Rice peter.rice at uk.lionbioscience.com
Tue Oct 8 16:08:47 UTC 2002


Joerg Schaber wrote:
> Hi,
> 
> i have problems creating an EMBOSS database from a DDBJ flatfile (e.g. 
> ftp://ftp.genome.ad.jp/pub/kegg/genomes/genes/Buchnera.ent) using 
> 'dbiflat -idformat gb'. I get a warning for all entries in the flatfile
> 'Warning: Duplicate ID skipped: '<null>' All hits will point to first ID 
> found´ and I can not retrieve any sequence. I think dbiflat only 
> recognizes the first entry.
> When I download the corresponding fasta flatfile I have no problems 
> creating an EMBOSS database using 'dbifasta'. However, I would like to 
> use the original DDBJ flatfile because it includes more information.
> Any idea what's the problem?

Yes ... that file is not in Genbank or DDBJ format!!!!

It looks more like a CODATA format, but only the ENTRY is recognized.
If you can find a name for it, we could probably implements a new 
input/output sequence format ... but it has some horrible features that 
will not be general.

Example entry:

ENTRY       BU002             CDS       Buchnera
NAME        atpB
DEFINITION  ATP synthase A chain [EC:3.6.3.14] [SP:ATP6_BUCAI]
CLASS       Metabolism; Energy Metabolism; Oxidative phosphorylation
             [PATH:buc00190]
             Metabolism; Energy Metabolism; ATP synthesis [PATH:buc00193]
             Metabolism; Energy Metabolism; Photosynthesis [PATH:buc00195]
POSITION    2278..3102
DBLINKS     RIKEN: BU002
             NCBI: 10038695
CODON_USAGE       T               C               A               G
           T  27   2  22   7  11   0   7   1   7   1   1   0   1   0   0   5
           C   4   0   3   2   6   1   4   2   5   1   8   2   1   0   2   0
           A  28   0   5  12   5   0   3   0   7   3  13   1   4   1   0   0
           G   4   1  12   3   5   1   5   0   8   0   7   1   7   2   4   0
AASEQ       274
             MILEKISDPQKYISHHLSHLQIDLRSFKIIQPGALSSDYWTVNVDSMFFSLVLGSFFLSI
             FYMVGKKITQGIPGKLQTAIELIFEFVNLNVKSMYQGKNALIAPLSLTVFIWVFLMNLMD
             LVPIDFFPFISEKVFELPAMRIVPSADINITLSMSLGVFFLILFYTVKIKGYVGFLKELI
             LQPFNHPVFSIFNFILEFVSLVSKPISLGLRLFGNMYAGEMIFILIAGLLPWWTQCFLNV
             PWAIFHILIISLQAFIFMVLTIVYLSMASQSHKD
NTSEQ       825
             atgattttagaaaagatatctgatcctcaaaaatatattagtcatcatttaagtcacttg
             cagatagatttgcgttcttttaaaattattcaaccaggtgcattgtcttctgattattgg
             actgtaaatgttgattcaatgtttttttctcttgtactgggtagtttttttttaagtatt
             ttttatatggtaggaaaaaaaattactcaaggtataccaggtaaattacaaactgcaatt
             gagttaatttttgaatttgtaaatttaaatgtaaaaagcatgtatcaaggtaaaaatgct
             cttattgcacctttatcattaacagtatttatttgggtttttttaatgaatctaatggat
             ttagttccgattgatttctttccatttatttctgaaaaagtgtttgaattacctgctatg
             cgaattgtaccttctgctgatattaatattacactatcaatgtcacttggcgtgtttttt
             ttaattttattttatactgttaaaattaaaggatatgtaggctttttaaaagaacttatt
             ttacaacctttcaaccatcctgtattttctatttttaattttatattagaatttgtgtca
             ttggtctcgaaacccatttctttgggattgcgattatttggaaacatgtacgcaggtgaa
             atgatttttattttaattgcaggtttgctgccatggtggacacaatgttttttaaacgta
             ccgtgggctatttttcatattttaataatttcactacaggcttttatttttatggtatta
             actattgtatatttatcaatggcctctcaatctcataaagattaa
///



-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723




More information about the EMBOSS mailing list