GenBank indexing Trouble

河合宏紀 hkawai at venus.dti.ne.jp
Tue Sep 10 03:33:53 UTC 2002


Hello
 
 I'm using EMBOSS package. I appreciate developers' efforts.
 Unfortunately, I found a trouble when I indexed GenBank 130 and 
called it with entret/seqret.

 First of all, I made index for all files of GenBank 130 (except
 EST,GSS,HTG) described below.
 --------------------------------------
 % /usr/local/EMBOSS/2.5.0/bin/dbiflat
 Index a flat file database
       EMBL : EMBL
      SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
         GB : Genbank, DDBJ
 Entry format [SWISS]: GB
 Database directory [.]: 
 Wildcard database filename [*.dat]: *.seq
 Database name: GB
 Release number [0.0]: 
 Index date [00/00/00]: 
 Warning: Duplicate ID skipped: 'AY071141'
 --------------------------------------

 When I called L11995 with "entret gb:L11995", I got the incorrect entry 
whose accession is M20152. And I tried to get gb:M20152, I got M20153. 
These three entries exist on the gbrod3.seq file sequentially. This 
trouble does not occur when I called entries whose 'LOCUS' and 
'ACCESSION' fields are identical (e.g.BC003860). Because this trouble 
occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm 
now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other
programs (entret/seqret and so on). 
 
 My hypothesis of this trouble is described below. 
 I focused on the duplicate ID AY071141 and I removed one AY071141entry
 (from gbinv4.seq file). 
 In this case, I could get correct entries. 
 When dbiflat finds duplicate ID to be skipped, I guess, the index counter
 of LOCUS and ACCESSION should be increased (or decreased). But in this
 version, ONLY LOCUS counter would be increased (or decreased) and
 ACCESSION's one would not be increased (or decreased).
 
I hope my report will be helpfull for developers.

Best regards
 
Kawai




More information about the EMBOSS mailing list