problems with indexing refseq
hkawai at venus.dti.ne.jp
Wed Mar 12 11:13:13 UTC 2003
David Martin <d.m.a.martin at dundee.ac.uk> wrote:
> I am getting strange behaviour with refseq.
> When indexing the genbank format cumulative files (rscu.gbff) with dbiflat
> -idformat GB I get an index that returns the wrong sequences.
> eg attempting to retrieve NM_060207 instead retrieves NM_131801 which is a
> totally different sequence entry.
> Attempting to retrieve NM131801 gives NM_165909.
This problem was described in this ML.
If you are using version 2.5.0 or earlier (I guess you do so),
I recommend to update this to 2.5.1 or later.
In addition, you have to reformat rscu.gbff file in the way
that the word following LOCUS in each entry is replaced to
their Accession No.
In this ML, a perl script which reformats in such way,
are uploaded (the Message-ID <2DC41140A89ED411989D00508BDCD9ED01E28754@
More information about the EMBOSS