[EMBOSS] Problems with Refseq
Enrique de Andres Saiz
enrique.deandres at pcm.uam.es
Wed Nov 2 10:29:35 UTC 2005
Hi,
I have some problems working with Refseq database.
I am using Emboss 3.0.0 and when I am indexing the database (.g?ff
files) using dbiflat command I get many warnings as:
Warning: Duplicate ID skipped: 'NP_857944' All hits will point to first
ID found
Another problem is that when I try to get an entry using seqret command,
I get another sequence with the accession I have selected. When I try
to get the entry using entret, I get several sequences.
I have tried to index only one file of the database and then access it
with seqret and entret. I get the same behaviour. For example, I have
next definition in emboss.default file:
DB rs_test [
type: N
method: emblcd
format: genbank
dir: $emboss_data/refseq
file: vertebrate_mammalian2.genomic.gbff
indexdir: /usr/users/bioadmin/opt/prueba
comment: "RefSeq test"
]
If I edit file vertebrate_mammalian2.genomic.gbff, I can see next entry:
LOCUS NW_113053 1059 bp DNA linear CON
09-NOV-2004
DEFINITION Pan troglodytes chromosome 10 genomic contig, whole genome
shotgun
sequence.
ACCESSION NW_113053
VERSION NW_113053.1 GI:52318716
KEYWORDS WGS.
SOURCE Pan troglodytes (chimpanzee)
ORGANISM Pan troglodytes
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
Hominidae; Pan.
COMMENT GENOME ANNOTATION REFSEQ: Features on this sequence have been
produced for build 1 version 1 of the NCBI's genome
annotation [see
documentation].
The DNA sequence for this assembly was produced by the
Chimpanzee
Genome Sequencing Consortium. This assembly was produced by the
Arachne assembler and made available in Nov. 2003.
FEATURES Location/Qualifiers
source 1..1059
/organism="Pan troglodytes"
/mol_type="genomic DNA"
/isolate="Yerkes chimp pedigree #C0471 (Clint)"
/db_xref="taxon:9598"
/chromosome="10"
CONTIG join(AADA01324841.1:1..1059)
If I run: seqret rs_test:NW_113053, I get:
$> seqret rs_test:NW_113053
Reads and writes (returns) sequences
Output sequence [nw_113053.fasta]: stdout
>NW_113053 NW_113053.1 Pan troglodytes olfactory receptor pseudogene
PTOR3A5P (PTOR3A5P) onchromosome 17.
ggaacgtactgcagcccatccgttttgctgtcttccgctttgcctacatcatcatagttg
ggggcaacctcagcatcctggctgccatctttgtggaccccaaactccatactcccatgt
attacttcctggggaacttgtctctgctggacatcgggtgcatcagtcactgttcctccg
atgctggcgtgtctcctggcccaccagtgcagagttccctatgctgcctgcatttcacaa
ctcttctttttccacctcctggctggggtggactgtcacctcttaatagccacggcctat
gactgctacctggctatctgtcagcttctcaccaacagcactcgcatgagctgtgaagtc
cagggtgccctggtgggaatttgctgcactgtctccttcatcaatgctctgactcacaca
gtggctgtgtctgtgcttgacttctgtggccctaatgtggtcaaccacttctgctgtgac
ctcccacctcttttccagctctcttgctccagcatccacctcaatgggcagctgctgctt
gtgggggccaccttcataggagtgctccccatgatctttatctcagtgtcctatgcccac
gtcacagccgcaatattacgaatccgctcagctgaggggaggaagaaggctttctccacg
tgtggctcccacctcaccgtggtctgaatcttttatggaactggcttcttcagttacatg
tgtctgggctcagtctcagcctcagacaaagataaggggattgggatcctcaacactatc
ctcagtcccatgctgaacccagtcatttacagcctccagaaccctgatgtgcagggcacc
ctgaaaagggtgctgacagggaagaggcccccagcttga
If I run: entret rs_test:NW_113053, I get several entries (the first one
is the correct one).
Any idea about what happens and how can I solve it?
Thanks in advance,
Enrique.
More information about the EMBOSS
mailing list