[EMBOSS] Problems with Refseq
Carlos Quijano
cquijano at iib.uam.es
Wed Nov 2 12:34:26 UTC 2005
El mié, 02-11-2005 a las 11:29 +0100, Enrique de Andres Saiz escribió:
> Hi,
> I have some problems working with Refseq database.
> I am using Emboss 3.0.0 and when I am indexing the database (.g?ff
> files) using dbiflat command I get many warnings as:
>
> Warning: Duplicate ID skipped: 'NP_857944' All hits will point to first
> ID found
>
And this warning should point to the very problem. Try to fix the
duplicates somehow. Probably there is a ID length trunkating method
specific to dbiflat when dealing with your flat files, so the best
solution is to modify your flat files if there is no way of forcing
dbiflat to accept your IDs...
The rest of symptoms are consistent with this observation.
I hope it helps you in some way.
> Another problem is that when I try to get an entry using seqret command,
> I get another sequence with the accession I have selected. When I try
> to get the entry using entret, I get several sequences.
>
> I have tried to index only one file of the database and then access it
> with seqret and entret. I get the same behaviour. For example, I have
> next definition in emboss.default file:
>
> DB rs_test [
> type: N
> method: emblcd
> format: genbank
> dir: $emboss_data/refseq
> file: vertebrate_mammalian2.genomic.gbff
> indexdir: /usr/users/bioadmin/opt/prueba
> comment: "RefSeq test"
> ]
>
> If I edit file vertebrate_mammalian2.genomic.gbff, I can see next entry:
>
> LOCUS NW_113053 1059 bp DNA linear CON
> 09-NOV-2004
> DEFINITION Pan troglodytes chromosome 10 genomic contig, whole genome
> shotgun
> sequence.
> ACCESSION NW_113053
> VERSION NW_113053.1 GI:52318716
> KEYWORDS WGS.
> SOURCE Pan troglodytes (chimpanzee)
> ORGANISM Pan troglodytes
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
> Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
> Hominidae; Pan.
> COMMENT GENOME ANNOTATION REFSEQ: Features on this sequence have been
> produced for build 1 version 1 of the NCBI's genome
> annotation [see
> documentation].
> The DNA sequence for this assembly was produced by the
> Chimpanzee
> Genome Sequencing Consortium. This assembly was produced by the
> Arachne assembler and made available in Nov. 2003.
> FEATURES Location/Qualifiers
> source 1..1059
> /organism="Pan troglodytes"
> /mol_type="genomic DNA"
> /isolate="Yerkes chimp pedigree #C0471 (Clint)"
> /db_xref="taxon:9598"
> /chromosome="10"
> CONTIG join(AADA01324841.1:1..1059)
>
> If I run: seqret rs_test:NW_113053, I get:
>
> $> seqret rs_test:NW_113053
> Reads and writes (returns) sequences
> Output sequence [nw_113053.fasta]: stdout
> >NW_113053 NW_113053.1 Pan troglodytes olfactory receptor pseudogene
> PTOR3A5P (PTOR3A5P) onchromosome 17.
> ggaacgtactgcagcccatccgttttgctgtcttccgctttgcctacatcatcatagttg
> ggggcaacctcagcatcctggctgccatctttgtggaccccaaactccatactcccatgt
> attacttcctggggaacttgtctctgctggacatcgggtgcatcagtcactgttcctccg
> atgctggcgtgtctcctggcccaccagtgcagagttccctatgctgcctgcatttcacaa
> ctcttctttttccacctcctggctggggtggactgtcacctcttaatagccacggcctat
> gactgctacctggctatctgtcagcttctcaccaacagcactcgcatgagctgtgaagtc
> cagggtgccctggtgggaatttgctgcactgtctccttcatcaatgctctgactcacaca
> gtggctgtgtctgtgcttgacttctgtggccctaatgtggtcaaccacttctgctgtgac
> ctcccacctcttttccagctctcttgctccagcatccacctcaatgggcagctgctgctt
> gtgggggccaccttcataggagtgctccccatgatctttatctcagtgtcctatgcccac
> gtcacagccgcaatattacgaatccgctcagctgaggggaggaagaaggctttctccacg
> tgtggctcccacctcaccgtggtctgaatcttttatggaactggcttcttcagttacatg
> tgtctgggctcagtctcagcctcagacaaagataaggggattgggatcctcaacactatc
> ctcagtcccatgctgaacccagtcatttacagcctccagaaccctgatgtgcagggcacc
> ctgaaaagggtgctgacagggaagaggcccccagcttga
>
> If I run: entret rs_test:NW_113053, I get several entries (the first one
> is the correct one).
>
> Any idea about what happens and how can I solve it?
>
> Thanks in advance,
> Enrique.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss
More information about the EMBOSS
mailing list