[EMBOSS] db formatting (?) and parsing issue -- emboss version 5.0.0

Wed Jan 28 13:47:40 UTC 2009

Hi Peter, thanks for your reply

Certainly:

1)For the failed run (for seq in `cat bhits`; do seqret -debug -filter 
staphyl68-id:$seq; done ) the seqret.dbg contains:

Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
' 0..0(N) '' 0  'staphyl68-id:FLTU7OB01AHJ67
'SA to test: 'staphyl68-id:FLTU7OB01AHJ67

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
'ound dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67
' Field 'id'ng 'FLTU7OB01AHJ67
' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
' acc '' hasacc:Yess/u4/tjonasse/mrsa/454/068_reads/' entry 'fltu7ob01ahj67
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
' acc: '' hasacc:Yesahj67
B+tree Entry failed
' not foundtry id:'fltu7ob01ahj67
seqEmbossQryClose clean up qryd
Database 'staphyl68' : access method 'emboss' failed

2)For the standalone successful run (seqret -debug 
staphyl68-id:FLTU7OB01AHJ67), seqret.dbg states:
Debug file seqret.dbg buffered:No
ajAcdInitP pgm 'seqret' package ''
ajFileNewIn '/site/share/EMBOSS/acd/seqret.acd'
EOF ajFileGetsL file /site/share/EMBOSS/acd/seqret.acd
closing file '/site/share/EMBOSS/acd/seqret.acd'
ajFileNewIn '/site/share/EMBOSS/acd/codes.english'
EOF ajFileGetsL file /site/share/EMBOSS/acd/codes.english
closing file '/site/share/EMBOSS/acd/codes.english'
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajTableNewFunctionLen hint 25 size 251
ajFileNewIn '/site/share/EMBOSS/acd/knowntypes.standard'
EOF ajFileGetsL file /site/share/EMBOSS/acd/knowntypes.standard
closing file '/site/share/EMBOSS/acd/knowntypes.standard'
Set acdprotein value '$(sequence.protein)'
ajSeqinClear called
++seqUsaProcess 'staphyl68-id:FLTU7OB01AHJ67' 0..0(N) '' 0
USA to test: 'staphyl68-id:FLTU7OB01AHJ67'

format regexp: No list:No
no format specified in USA

...input format not set
dbname dbexp: Yes
found dbname 'staphyl68' level: 'id' qry->QryString: 'FLTU7OB01AHJ67'
   db QryString 'FLTU7OB01AHJ67' Field 'id'
ajSeqQueryWild id 'FLTU7OB01AHJ67' acc '' sv '' gi '' des '' org '' key ''
no wildcard in stored qry
database type: 'N' format 'embl'
use access method 'emboss'
Matched seqAccess[1] 'emboss'
seqAccessEmboss type 1
directory '/div/dias/u4/tjonasse/mrsa/454/068_reads/' entry 
'fltu7ob01ahj67' acc '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
EOF ajFileGetsL file 
/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.pxid'
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
EOF ajFileGetsL file /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.ent'
entry id: 'fltu7ob01ahj67' acc: '' hasacc:Yes
ajFileNewIn '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
seqEmbossQryClose clean up qryd
seqRead: cleared
seqRead: seqin format 3 'embl'
seqRead: one format specified
ajFileBuffNobuff /div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat 
buffsize: 0
++seqRead known format 3
++seqReadFmt format 3 (embl) 'staphyl68-id:FLTU7OB01AHJ67' feat No
seqReadEmbl first line 'ID   FLTU7OB01AHJ67; SV 1; linear; unassigned 
DNA; STD; UNC; 184 BP.
'
seqReadEmbl ID line found
seqSetName word 'FLTU7OB01AHJ67'
seqSetName 'FLTU7OB01AHJ67' result: 'FLTU7OB01AHJ67'
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajTableNewFunctionLen hint 4 size 251
ajFileBuffClear (0) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (0 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
seqReadFmt success with format 3 (embl)
seqQueryMatch 'FLTU7OB01AHJ67' id 'fltu7ob01ahj67' acc '' Sv '' Gi '' 
Des '' Key '' Org '' Case No Done Yes
seqTypeSet 'N'
ajSeqTypeCheckIn type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
ajSeqTypeCheckIn: bad characters test passed, convert
Convert '?' to 'X'
ajSeqTypeCheckIn: OK - no badchars
seqDefine: thys->Db 'staphyl68', seqin->Db 'staphyl68'
seqDefine: thys->Name 'FLTU7OB01AHJ67' type: N
seqDefine: thys->Entryname 'FLTU7OB01AHJ67', seqin->Entryname ''
seqDefine: returns thys->Name 'FLTU7OB01AHJ67' type: N
++ajSeqallread set db: 'staphyl68' => 'staphyl68'
ajSeqallGetName ''
ajSeqIsNuc Type 'N'
ajSeqIsNuc Type 'N'
ajSeqIsProt Type 'N'
ajSeqallGetUsa 'staphyl68-id:FLTU7OB01AHJ67'
ajSeqallGetseqName 'FLTU7OB01AHJ67'
... output format not set, default to 'fasta'
ajSeqoutClear called
... output format not set, default to 'fasta'
ajSeqoutOpen dir '' qrydir ''
seqoutUsaProcess
output USA to test: 'fltu7ob01ahj67.fasta'

format regexp: No
no format specified in USA

file:id regexp: Yes
found filename fltu7ob01ahj67.fasta single: No dir: ''
ajFileNewOutD('' 'fltu7ob01ahj67.fasta')
ajFileNewOutD open name 'fltu7ob01ahj67.fasta'

ajSeqSetRange (len: 184 0..0 old 0..0) rev:No reversed:No
       result: (len: 184 0..0)
ajSeqoutWriteSeq 'FLTU7OB01AHJ67' len: 184
ajSeqoutWriteSeq 17 'fasta' single: No feat: No Save: No
seqClone out Setdb '' Db '' seq Setdb '<null>' Db 'staphyl68'
seqClone outseq->Type '' seq->Type 'N'
seqClone 0 .. 0 1 .. 184 len: 184 type: 'N'
   Db: 'staphyl68' Name: 'FLTU7OB01AHJ67' Entryname: 'FLTU7OB01AHJ67'
ajSeqTypeCheckS type 'gapany' found (any valid sequence with gaps)
Convert gaps to '-'
Convert '?' to 'X'
ajSeqoutSetNameDefaultS already has a name 'FLTU7OB01AHJ67'
seqWriteFasta outseq Db 'staphyl68' Setdb '' Setoutdb '' Name 
'FLTU7OB01AHJ67'
seqoutUfoLocal Features No Ufo 0 ''
ajSeqoutWriteSeq tests features No tabouitisopen No UfoLocal No ftlocal No
ajSeqRead: input file 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' still there, 
try again
seqRead: cleared
seqRead: single access - count 1 - call access routine again
seqAccessEmboss type 1
seqEmbossQryReuse: query data all finished
seqRead: seqin->Query->Access->Access(seqin) *failed*
ajSeqRead: open buffer  usa: 'staphyl68-id:FLTU7OB01AHJ67' returns: No
ajSeqallNext failed
ajSeqinClear called
ajFileBuffClear (-1) Nobuff: Yes
size 0: Lines: 0 Curr: 0  Prev: 0 Last: 0 Free: 0 Freelast: 0
ajFileBuffClear 
'/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat' (-1 lines)
      Y size: 0 pos: 0 removed 0 lines add to free: 0
Trace buffer file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
              Pos: 0 Size: 0 FreeSize: 0 Fpos: 153477365 End: N
  Free: 0 Last: -1
closing file '/div/dias/u4/tjonasse/mrsa/454/068_reads//staphyl68.dat'
ajSeqoutClose 'fltu7ob01ahj67.fasta'
closing file 'fltu7ob01ahj67.fasta'
ajSeqinDel called usa:''
ajSeqQueryDel db:'' id:''

Final Summary
=============

Table usage : 11 opened, 0 closed, 251 maxsize, 40 maxmem
List usage : 27 opened, 27 closed, 1438 maxsize 2380 nodes
List iterator usage : 4 opened, 4 closed
File usage : 1 opened, 9 closed, 3 max, 10 total
ajNamExit done
Regexp usage (bytes): 168 allocated, 1008 freed, -840 in use (sizes change)
Regexp usage (number): 21 allocated, 21 freed 0 in use
Array usage (bytes): 0 allocated, 0 freed, 0 in use
Array usage (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 2D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 2D (number): 0 allocated, 0 freed, 0 resized, 0 in use
Array usage 3D (bytes): 0 allocated, 0 freed, 0 in use
Array usage 3D (number): 0 allocated, 0 freed, 0 resized, 0 in use
String usage (bytes): 268013 allocated, 268270 freed, -257 in use
String usage (number): 4982 allocated, 4979 freed 3 in use
Memory usage (bytes): 535329 allocated, 640 reallocated 503881 zeroed
Memory usage (number): 14393 allocates, 14405 frees, 10 resizes, -12 in use
closing file 'seqret.dbg'


3)The staphyl68.pxid file contains:
Order     60
Fill      42
Pagesize  2048
Level     2
Cachesize 200
Order2    82
Fill2     99
Count     288506
Kwlimit   15


In addition, the definition plus resource record I defined for the the 
staphyl68 database in my local .embossrc file is the following (which 
should accommodate for the length of the id field, shouldn't it?):

DB staphyl68 [
         type: N
         method: emboss
         format: embl
         fields: "id,des"
         file: staphyl68.dat
         indexdirectory: /div/dias/u4/tjonasse/mrsa/454/068_reads/
         comment: "mrsa staphyl68 reads"
]

RES staphyl68 [
    type: Index
    idlen:  20
    deslen: 50
]


Best regards,
GM


Peter Rice wrote:
> Hi George,
> 
>> Why does the filter mode seqret invoked inside the for loop fails and 
>> this one works, and the problem does not exist for the 'afile' but 
>> only the 'bfile'?
> 
> Can you add "-debug" to the seqret commandline and send me the
> seqret.dbg file (it will be for the last seqret run so you'll need some
> way to make sure the last run failed)
> 
> and also sent the seqret.dbg file for running seqret standalone with the
> same ID that worked.
> 
> It would also be useful to see the .pxid file for the staphyl68 database
> (it includes the length of ID that was indexed - your IDs are quite long
> for dbxflat)
> 
> regards,
> 
> Peter
> 

--