seqret doesn't count more than 99?

simon andrews (BI) simon.andrews at bbsrc.ac.uk
Mon Apr 29 09:49:29 UTC 2002


> -----Original Message-----
> From: Charles Plessy [mailto:charles at moulinette.dyndns.org]
> Sent: 26 April 2002 22:31
> To: emboss at hgmp.mrc.ac.uk
> Subject: seqret doesn't count more than 99?
> 
> 
> Hello,
> 
> I downloaded the draft of the fugu genome 

[snip]

> I'm not able to index a blast database correctly if the header doesn't
> look «ncbi compliant» ant formatdb haddn't been run with the -o flag.

I'd not tried this before, but we see the same thing here.  Running dbiblast
on the indexed raw fugu data seems to work, but seqret fails on the
subsequent retrieval.

The problem seems to be in the accession numbers entered into the .trg file
created by dbiblast.  Running seqret with debug on, shows the following
(edited) entries:

------------------------------------
USA to test: 'fugu_blasttest:Scaffold_1'
[snip]

found dbname fugu_blasttest
wild query 'Scaffold_1' 'Scaffold_1' '' 
database type: 'N' format 'ncbi'
use access method 'blast'
Matched seqAccess[12] 'blast'
seqAccessBlast type 1
[snip]

seqCdIdxSearch (entry 'Scaffold_1')
[several more of these]
idx test 59 'Scaffold_100' -1 (+/- 39)
idx test 49 'Contig_83248'  1 (+/- 18)
idx test 54 'Contig_9376'  1 (+/- 8)
idx test 56 'Scaffold_10' -1 (+/- 3)
idx test 55 'Scaffold_1' -1 (+/- 0)
 
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.trg'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.trg'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.trg
  FileSize: 416800 NRecords: 20825 recsize: 20 idsize: 10
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.trg' NRecords: 20825 RecSize: 20
ajFileNewIn '/data/Fugu/EMBOSS/TEST/acnum.hit'
ajNamResolve of '/data/Fugu/EMBOSS/TEST/acnum.hit'
seqCdReadHeader file /data/Fugu/EMBOSS/TEST/acnum.hit
  FileSize: 83600 NRecords: 20825 recsize: 4 idsize: -6
seqCdFileOpen '/data/Fugu/EMBOSS/TEST/acnum.hit' NRecords: 20825 RecSize: 4
seqCdTrgSearch 'Scaffold_1' recSize: 20
trg test 10412 'ZZ0010413' -1 (+/- 20825)
trg test 5206 'ZZ0005207' -1 (+/- 10412)
trg test 2603 'ZZ0002604' -1 (+/- 5206)
trg test 1301 'ZZ0001302' -1 (+/- 2603)
trg test 650 'ZZ0000651' -1 (+/- 1301)
trg test 325 'ZZ0000326' -1 (+/- 650)
trg test 162 'ZZ0000163' -1 (+/- 325)
trg test 81 'ZZ0000082' -1 (+/- 162)
trg test 40 'ZZ0000041' -1 (+/- 81)
trg test 20 'ZZ0000021' -1 (+/- 40)
trg test 10 'ZZ0000011' -1 (+/- 20)
trg test 5 'ZZ0000006' -1 (+/- 10)
trg test 2 'ZZ0000003' -1 (+/- 5)
trg test 1 'ZZ0000002' -1 (+/- 2)
trg test 0 'ZZ0000001' -1 (+/- 1)
'SCAFFOLD_1' not found found in .trg

------------------------------------------------

After this is cleans up after itself and exits.  Looking through the .trg
file all the accessions are of the form ZZ0000XXX.  This format of accession
doesn't appear anywhere in my original data, so I don't know where it's
coming from (presumably either dbiblast or formatdb?).  The inability to
reconcile the Scaffold_1 with the ZZ00... accessions seems to be what causes
seqret to fail.


> I created the blast database and indexed it with dbiblast. The reason
> for not formatting the fasta file itself is to save space. This also
> enforces a synchronicity between the blast hits names and the names
> that I can give to seqret.

The way we did this was to use the fasta files for both.  I take the point
about the space saving, but the assembled data wasn't all that big.  If you
use the raw fasta files for both formatdb (without header parsing) and
dbifasta, then you can still use the same accession codes as reference in
both.


> Here is now the prbolem :
> 
> charles at pc-1035-a:~$ seqret fugu:Scaffold_100
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'fugu:Scaffold_100'
> 
> ==> KO :((
> 
> seqret can't fetch sequences names like Scaffold_xzy, where 
> xyz >= 100.
> 
> Is it due to the length of the name?

It might be worth running seqret with the -debug flag on and looking at the
messages at the end of seqret.dbg.  This usually gives some more useful
information about what is going wrong in these cases.

I'd be interested in seeing a resolution to this as well...

	TTFN

	Simon.





More information about the EMBOSS mailing list