[EMBOSS] Problems with Refseq

Carlos Quijano cquijano at iib.uam.es
Wed Nov 2 12:34:26 UTC 2005


El mié, 02-11-2005 a las 11:29 +0100, Enrique de Andres Saiz escribió:

> Hi,
> I have some problems working with Refseq database.
> I am using Emboss 3.0.0 and when I am indexing the database (.g?ff 
> files) using dbiflat command I get many warnings as:
> 
> Warning: Duplicate ID skipped: 'NP_857944' All hits will point to first 
> ID found
> 


And this warning should point to the very problem. Try to fix the
duplicates somehow. Probably there is a ID length trunkating method
specific to dbiflat when dealing with your flat files, so the best
solution is to modify your flat files if there is no way of forcing
dbiflat to accept your IDs... 

The rest of symptoms are consistent with this observation.

I hope it helps you in some way.


> Another problem is that when I try to get an entry using seqret command, 
> I get another sequence with the accession I have selected.  When I try 
> to get the entry using entret, I get several sequences.
> 
> I have tried to index only one file of the database and then access it 
> with seqret and entret. I get the same behaviour. For example, I have 
> next definition in emboss.default file:
> 
> DB rs_test [
>     type: N
>     method: emblcd
>     format: genbank
>     dir: $emboss_data/refseq
>     file: vertebrate_mammalian2.genomic.gbff
>     indexdir: /usr/users/bioadmin/opt/prueba
>     comment: "RefSeq test"
> ]
> 
> If I edit file vertebrate_mammalian2.genomic.gbff, I can see next entry:
> 
> LOCUS       NW_113053               1059 bp    DNA     linear   CON 
> 09-NOV-2004
> DEFINITION  Pan troglodytes chromosome 10 genomic contig, whole genome 
> shotgun
>             sequence.
> ACCESSION   NW_113053
> VERSION     NW_113053.1  GI:52318716
> KEYWORDS    WGS.
> SOURCE      Pan troglodytes (chimpanzee)
>   ORGANISM  Pan troglodytes
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
> Euteleostomi;
>             Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
>             Hominidae; Pan.
> COMMENT     GENOME ANNOTATION REFSEQ:  Features on this sequence have been
>             produced for build 1 version 1 of the NCBI's genome 
> annotation [see
>             documentation].
>             The DNA sequence for this assembly was produced by the 
> Chimpanzee
>             Genome Sequencing Consortium. This assembly was produced by the
>             Arachne assembler and made available in Nov. 2003.
> FEATURES             Location/Qualifiers
>      source          1..1059
>                      /organism="Pan troglodytes"
>                      /mol_type="genomic DNA"
>                      /isolate="Yerkes chimp pedigree #C0471 (Clint)"
>                      /db_xref="taxon:9598"
>                      /chromosome="10"
> CONTIG      join(AADA01324841.1:1..1059)
> 
> If I run: seqret rs_test:NW_113053, I get:
> 
> $> seqret rs_test:NW_113053
> Reads and writes (returns) sequences
> Output sequence [nw_113053.fasta]: stdout
>  >NW_113053 NW_113053.1 Pan troglodytes olfactory receptor pseudogene 
> PTOR3A5P (PTOR3A5P) onchromosome 17.
> ggaacgtactgcagcccatccgttttgctgtcttccgctttgcctacatcatcatagttg
> ggggcaacctcagcatcctggctgccatctttgtggaccccaaactccatactcccatgt
> attacttcctggggaacttgtctctgctggacatcgggtgcatcagtcactgttcctccg
> atgctggcgtgtctcctggcccaccagtgcagagttccctatgctgcctgcatttcacaa
> ctcttctttttccacctcctggctggggtggactgtcacctcttaatagccacggcctat
> gactgctacctggctatctgtcagcttctcaccaacagcactcgcatgagctgtgaagtc
> cagggtgccctggtgggaatttgctgcactgtctccttcatcaatgctctgactcacaca
> gtggctgtgtctgtgcttgacttctgtggccctaatgtggtcaaccacttctgctgtgac
> ctcccacctcttttccagctctcttgctccagcatccacctcaatgggcagctgctgctt
> gtgggggccaccttcataggagtgctccccatgatctttatctcagtgtcctatgcccac
> gtcacagccgcaatattacgaatccgctcagctgaggggaggaagaaggctttctccacg
> tgtggctcccacctcaccgtggtctgaatcttttatggaactggcttcttcagttacatg
> tgtctgggctcagtctcagcctcagacaaagataaggggattgggatcctcaacactatc
> ctcagtcccatgctgaacccagtcatttacagcctccagaaccctgatgtgcagggcacc
> ctgaaaagggtgctgacagggaagaggcccccagcttga
> 
> If I run: entret rs_test:NW_113053, I get several entries (the first one 
> is the correct one).
> 
> Any idea about what happens and how can I solve it?
> 
> Thanks in advance,
> Enrique.
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss



More information about the EMBOSS mailing list