problem with dbiflat (?)

axel klenk axel.klenk at morphochem.ch
Wed Jan 29 16:34:04 UTC 2003


Hi all,

I have a problem with dbiflat (I suppose) and the index created for
SWISS-PROT 40.41 yesterday. Some sequence ids (9,401 to be
precise) cannot be retrieved by any EMBOSS program as single
sequences, but they are found when searching with wildcards (see
examples below). This happens only with SWISS-PROT, there are
no problems with TrEMBL, EMBL nor any other of our databases;
and it happens with dbiflat indexes from EMBOSS 2.4.1 and 2.6.0.
The package has been built using gcc 2.95.3 on Solaris 8.

Is this a known problem and are there any solutions for it? I have
attached some funny examples and a debug file that might help.

Thanks in advance,

 - axel klenk
----------------------------------------
axel klenk
morphochem AG
wro-1055
schwarzwaldallee 215
4058 basel
tel. ++41-61-6952104
fax  ++41-61-6952122
axel.klenk at morphochem.ch
http://www.morphochem.ch



Details: dbiflat builds the index without any complaint:


mbsun01:/data/bioinfo/emboss/swissprot/40.41> dbiflat
Index a flat file database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
    REFSEQ : Refseq
Entry format [SWISS]:
Database directory [.]: /data/bioinfo/db/swissprot/latest
Wildcard database filename [*.dat]: sprot.dat
Database name: sw
Release number [0.0]: 40.41
Index date [00/00/00]: 01/29/03
mbsun01:/data/bioinfo/emboss/swissprot/40.41> ll
total 10018
-rw-r--r--   1 bioinfo  bioinfo   591864 Jan 29 16:36 acnum.hit
-rw-r--r--   1 bioinfo  bioinfo  2068590 Jan 29 16:36 acnum.trg
-rw-r--r--   1 bioinfo  bioinfo      346 Jan 29 16:36 division.lkp
-rw-r--r--   1 bioinfo  bioinfo  2430600 Jan 29 16:36 entrynam.idx


it finds: dyr*_ecoli and dyrf_ecoli but not dyr_ecoli nor dyra_ecoli nor 
dyra*
and only some dyrb*s:


mbsun01:/export/home/aklenk/tmp> infoseq sw:dyr\*_ecoli
Displays some simple information about sequences
# USA             Name        Accession Type Length     Description
sw-id:DYR1_ECOLI  DYR1_ECOLI    P00382  P    157        Dihydrofolate 
reductase type I (EC 1.5.1.3) (Trimethoprim resistance protein).
sw-id:DYR5_ECOLI  DYR5_ECOLI    P11731  P    157        Dihydrofolate 
reductase type V (EC 1.5.1.3).
sw-id:DYR7_ECOLI  DYR7_ECOLI    P27422  P    157        Dihydrofolate 
reductase type VII (EC 1.5.1.3).
sw-id:DYR8_ECOLI  DYR8_ECOLI    Q57452  P    169        Dihydrofolate 
reductase type VIII (EC 1.5.1.3) (DHFR type IIIC).
sw-id:DYR9_ECOLI  DYR9_ECOLI    Q59397  P    177        Dihydrofolate 
reductase type IX (EC 1.5.1.3).
sw-id:DYRA_ECOLI  DYRA_ECOLI    Q04515  P    187        Dihydrofolate 
reductase type X (EC 1.5.1.3).
sw-id:DYRC_ECOLI  DYRC_ECOLI    Q59408  P    165        Dihydrofolate 
reductase type XIII (EC 1.5.1.3).
sw-id:DYRF_ECOLI  DYRF_ECOLI    P78218  P    157        Dihydrofolate 
reductase type XV (EC 1.5.1.3).
sw-id:DYR_ECOLI   DYR_ECOLI     P00379  P    159        Dihydrofolate 
reductase (EC 1.5.1.3).

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyrf_ecoli
Displays some simple information about sequences
# USA             Name        Accession Type Length     Description
sw-id:DYRF_ECOLI  DYRF_ECOLI    P78218  P    157        Dihydrofolate 
reductase type XV (EC 1.5.1.3).

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyra_ecoli
Displays some simple information about sequences
Error: Database Entry 'dyra_ecoli' not found
Error: Unable to read sequence 'sw:dyra_ecoli'
Died: infoseq terminated: Bad value for option [sequence] and no prompt

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyr_ecoli
Displays some simple information about sequences
Error: Database Entry 'dyr_ecoli' not found
Error: Unable to read sequence 'sw:dyr_ecoli'
Died: infoseq terminated: Bad value for option [sequence] and no prompt

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyra\*
Displays some simple information about sequences
Error: Database Query 'dyra*' not found
Error: Unable to read sequence 'sw:dyra*'
Died: infoseq terminated: Bad value for option [sequence] and no prompt

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyrb\*
Displays some simple information about sequences
# USA             Name        Accession Type Length     Description
sw-id:DYRB_MOUSE  DYRB_MOUSE    Q9Z188  P    589        Dual-specificity 
tyrosine-phosphorylation regulated kinase 1B (EC 2.7.1.-).
sw-id:DYRB_STAAM  DYRB_STAAM    P10167  P    158        Dihydrofolate 
reductase type I (EC 1.5.1.3).

mbsun01:/export/home/aklenk/tmp> infoseq sw:dyr\* | grep -i dyrb
Displays some simple information about sequences
sw-id:DYRB_HUMAN  DYRB_HUMAN    Q9Y463  P    629        Dual-specificity 
tyrosine-phosphorylation regulated kinase 1B (EC 2.7.1.-) (Mirk protein 
kinase).
sw-id:DYRB_MOUSE  DYRB_MOUSE    Q9Z188  P    589        Dual-specificity 
tyrosine-phosphorylation regulated kinase 1B (EC 2.7.1.-).
sw-id:DYRB_STAAM  DYRB_STAAM    P10167  P    158        Dihydrofolate 
reductase type I (EC 1.5.1.3).

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: infoseq.dbg
Type: application/octet-stream
Size: 17537 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20030129/2038efd0/attachment-0001.obj>


More information about the EMBOSS mailing list