loading DDBJ data into EMBOSS
Peter Rice
peter.rice at uk.lionbioscience.com
Tue Oct 8 16:08:47 UTC 2002
Joerg Schaber wrote:
> Hi,
>
> i have problems creating an EMBOSS database from a DDBJ flatfile (e.g.
> ftp://ftp.genome.ad.jp/pub/kegg/genomes/genes/Buchnera.ent) using
> 'dbiflat -idformat gb'. I get a warning for all entries in the flatfile
> 'Warning: Duplicate ID skipped: '<null>' All hits will point to first ID
> found´ and I can not retrieve any sequence. I think dbiflat only
> recognizes the first entry.
> When I download the corresponding fasta flatfile I have no problems
> creating an EMBOSS database using 'dbifasta'. However, I would like to
> use the original DDBJ flatfile because it includes more information.
> Any idea what's the problem?
Yes ... that file is not in Genbank or DDBJ format!!!!
It looks more like a CODATA format, but only the ENTRY is recognized.
If you can find a name for it, we could probably implements a new
input/output sequence format ... but it has some horrible features that
will not be general.
Example entry:
ENTRY BU002 CDS Buchnera
NAME atpB
DEFINITION ATP synthase A chain [EC:3.6.3.14] [SP:ATP6_BUCAI]
CLASS Metabolism; Energy Metabolism; Oxidative phosphorylation
[PATH:buc00190]
Metabolism; Energy Metabolism; ATP synthesis [PATH:buc00193]
Metabolism; Energy Metabolism; Photosynthesis [PATH:buc00195]
POSITION 2278..3102
DBLINKS RIKEN: BU002
NCBI: 10038695
CODON_USAGE T C A G
T 27 2 22 7 11 0 7 1 7 1 1 0 1 0 0 5
C 4 0 3 2 6 1 4 2 5 1 8 2 1 0 2 0
A 28 0 5 12 5 0 3 0 7 3 13 1 4 1 0 0
G 4 1 12 3 5 1 5 0 8 0 7 1 7 2 4 0
AASEQ 274
MILEKISDPQKYISHHLSHLQIDLRSFKIIQPGALSSDYWTVNVDSMFFSLVLGSFFLSI
FYMVGKKITQGIPGKLQTAIELIFEFVNLNVKSMYQGKNALIAPLSLTVFIWVFLMNLMD
LVPIDFFPFISEKVFELPAMRIVPSADINITLSMSLGVFFLILFYTVKIKGYVGFLKELI
LQPFNHPVFSIFNFILEFVSLVSKPISLGLRLFGNMYAGEMIFILIAGLLPWWTQCFLNV
PWAIFHILIISLQAFIFMVLTIVYLSMASQSHKD
NTSEQ 825
atgattttagaaaagatatctgatcctcaaaaatatattagtcatcatttaagtcacttg
cagatagatttgcgttcttttaaaattattcaaccaggtgcattgtcttctgattattgg
actgtaaatgttgattcaatgtttttttctcttgtactgggtagtttttttttaagtatt
ttttatatggtaggaaaaaaaattactcaaggtataccaggtaaattacaaactgcaatt
gagttaatttttgaatttgtaaatttaaatgtaaaaagcatgtatcaaggtaaaaatgct
cttattgcacctttatcattaacagtatttatttgggtttttttaatgaatctaatggat
ttagttccgattgatttctttccatttatttctgaaaaagtgtttgaattacctgctatg
cgaattgtaccttctgctgatattaatattacactatcaatgtcacttggcgtgtttttt
ttaattttattttatactgttaaaattaaaggatatgtaggctttttaaaagaacttatt
ttacaacctttcaaccatcctgtattttctatttttaattttatattagaatttgtgtca
ttggtctcgaaacccatttctttgggattgcgattatttggaaacatgtacgcaggtgaa
atgatttttattttaattgcaggtttgctgccatggtggacacaatgttttttaaacgta
ccgtgggctatttttcatattttaataatttcactacaggcttttatttttatggtatta
actattgtatatttatcaatggcctctcaatctcataaagattaa
///
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
More information about the EMBOSS
mailing list