GenBank indexing Trouble (fwd)

Shayanthan Parameswaran shay at bioinfo.sickkids.on.ca
Fri Sep 13 17:22:29 UTC 2002


To all,
We installed Emboss 2.5.1 and indexed genbank 131 with the GB format option
using the new dbiflat that corrected the error of incorrect entry retrieval.
We tried the new REFSEQ option in dbiflat to index refseq, however, the error
that was fixed in the dbiflat GB option does not seem to be fixed in the
REFSEQ format option.  Seqret retrieves the entry NM_066922.1 instead of
NM_066918.
Has anyone else experienced this error with the REFSEQ format option?

Shay


>
> Date: Tue, 10 Sep 2002 13:03:42 +0100 (BST)
> From: ableasby at hgmp.mrc.ac.uk
> To: emboss at hgmp.mrc.ac.uk
> Subject: EMBOSS 2.5.1 released
>
> This release fixes problems associated with non-unique identifiers
> in some databases (e.g. REFSEQ). Note that there is now a specific
> indexing option for that database in dbiflat.
>
> Alan
>
>
> Date: Tue, 10 Sep 2002 12:33:53 +0900
> From: "[ISO-2022-JP] 河合宏紀" <hkawai at venus.dti.ne.jp>
> To: emboss at embnet.org
> Subject: GenBank indexing Trouble
>
> Hello
>
>  I'm using EMBOSS package. I appreciate developers' efforts.
>  Unfortunately, I found a trouble when I indexed GenBank 130 and
> called it with entret/seqret.
>
>  First of all, I made index for all files of GenBank 130 (except
>  EST,GSS,HTG) described below.
>  --------------------------------------
>  % /usr/local/EMBOSS/2.5.0/bin/dbiflat
>  Index a flat file database
>        EMBL : EMBL
>       SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
>          GB : Genbank, DDBJ
>  Entry format [SWISS]: GB
>  Database directory [.]:
>  Wildcard database filename [*.dat]: *.seq
>  Database name: GB
>  Release number [0.0]:
>  Index date [00/00/00]:
>  Warning: Duplicate ID skipped: 'AY071141'
>  --------------------------------------
>
>  When I called L11995 with "entret gb:L11995", I got the incorrect entry
> whose accession is M20152. And I tried to get gb:M20152, I got M20153.
> These three entries exist on the gbrod3.seq file sequentially. This
> trouble does not occur when I called entries whose 'LOCUS' and
> 'ACCESSION' fields are identical (e.g.BC003860). Because this trouble
> occurs with dbiflat in version 2.4.1 or 2.5.0 but does not in 2.3.1, I'm
> now using EMBOSS 2.3.1 for only dbiflat/dbifasta, and 2.4.1 for other
> programs (entret/seqret and so on).
>
>  My hypothesis of this trouble is described below.
>  I focused on the duplicate ID AY071141 and I removed one AY071141entry
>  (from gbinv4.seq file).
>  In this case, I could get correct entries.
>  When dbiflat finds duplicate ID to be skipped, I guess, the index counter
>  of LOCUS and ACCESSION should be increased (or decreased). But in this
>  version, ONLY LOCUS counter would be increased (or decreased) and
>  ACCESSION's one would not be increased (or decreased).
>
> I hope my report will be helpfull for developers.
>
> Best regards
>
> Kawai

--
Shayanthan Parameswaran                Bioinformatics Supercomputing Centre
Programmer (416) 813-8030              555 University Avenue
email: shay at bioinfo.sickkids.on.ca     The Hospital for Sick Children
http:  www.bioinfo.sickkids.on.ca      Toronto, ON, M5G 1X8, CANADA.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20020913/f5914740/attachment-0001.html>


More information about the EMBOSS mailing list