GenBank indexing Trouble
河合宏紀
hkawai at venus.dti.ne.jp
Tue Sep 10 03:33:53 UTC 2002
Hello
I'm using EMBOSS package. I appreciate developers' efforts.
Unfortunately, I found a trouble when I indexed GenBank 130 and
called it with entret/seqret.
First of all, I made index for all files of GenBank 130 (except
EST,GSS,HTG) described below.
--------------------------------------
% /usr/local/EMBOSS/2.5.0/bin/dbiflat
Index a flat file database
EMBL : EMBL
SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
GB : Genbank, DDBJ
Entry format [SWISS]: GB
Database directory [.]:
Wildcard database filename [*.dat]: *.seq
Database name: GB
Release number [0.0]:
Index date [00/00/00]:
Warning: Duplicate ID skipped: 'AY071141'
--------------------------------------
When I called L11995 with "entret gb:L11995", I got the incorrect entry
whose accession is M20152. And I tried to get gb:M20152, I got M20153.
These three entries exist on the gbrod3.seq file sequentially. This
trouble does not occur when I called entries whose 'LOCUS' and
'ACCESSION' fields are identical (e.g.BC003860). Because this trouble
occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm
now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other
programs (entret/seqret and so on).
My hypothesis of this trouble is described below.
I focused on the duplicate ID AY071141 and I removed one AY071141entry
(from gbinv4.seq file).
In this case, I could get correct entries.
When dbiflat finds duplicate ID to be skipped, I guess, the index counter
of LOCUS and ACCESSION should be increased (or decreased). But in this
version, ONLY LOCUS counter would be increased (or decreased) and
ACCESSION's one would not be increased (or decreased).
I hope my report will be helpfull for developers.
Best regards
Kawai
More information about the EMBOSS
mailing list