problems with indexing refseq

Hironori Kawai hkawai at
Wed Mar 12 11:13:13 UTC 2003


David Martin <d.m.a.martin at> wrote:

> I am getting strange behaviour with refseq.
> When indexing the genbank format cumulative files (rscu.gbff) with dbiflat
> -idformat GB I get an index that returns the wrong sequences.
> eg attempting to retrieve NM_060207 instead retrieves NM_131801 which is a
> totally different sequence entry.
> Attempting to retrieve NM131801 gives NM_165909.

This problem was described in this ML.
If you are using version 2.5.0 or earlier (I guess you do so), 
I recommend to update this to 2.5.1 or later.

In addition, you have to reformat rscu.gbff file in the way
that the word following LOCUS in each entry is replaced to 
their Accession No.

In this ML, a perl script which reformats in such way,
are uploaded (the Message-ID <2DC41140A89ED411989D00508BDCD9ED01E28754@>).

Hironori Kawai

More information about the EMBOSS mailing list