[EMBOSS] Getting headers from Seqret

simon andrews (BI) simon.andrews at bbsrc.ac.uk
Wed Aug 1 12:56:08 UTC 2001


[sent to Emboss mailing list]

Dear All,

I'm having trouble getting header information back through seqret, from a
database formatted using dbiflat against a genbank flat file (refseq
actually).  I'm sure plenty of people must have done this before, but I've
read through the documentation, and I can't see where I'm going wrong!

The database formatted OK, and I can fetch sequences back from it, but at
some point I will need to retrieve the entire header from the original file
to get at some of the extra information in there (feature tables, cross
references, authors etc).  I've tried several different output USAs with
seqret, but the most I can seem to get back is the name, accession number
and description.

I can't believe that this information is thrown away by seqret (it's still
there in the flat file after all), so how can I retrieve it?

	Thanks for any help

	Simon

[Potentially useful details follow]

----
Simon Andrews PhD
Bioinformatics Dept
The Babraham Institute

simon.andrews at bbsrc.ac.uk
+44 (0)1223 496463 


##########################################################################


	Emboss version = 2.0.0

	Platform = DEC alpha (OSF1 v4.0)


My emboss.default entry for the database looks like;

	DB refseq [
	        type: N
	        method: emblcd
	        format: gb
	        dir: /usr/users/andrewss/Refseq/Genbank
	        file: "*.gbff"
	        release: "1.0"
	        comment: "Refseq Hum Mus Rat"
	]

and an example of the output of seqret with a debug USA is (with the
documentation space suspiciously blank!);

Sequence output trace
=====================

  Name: 'NM_031360'
  Accession: 'NM_031360'
  Description: 'Rattus norvegicus neutral sphingomyelinase (Smpd2), mRNA.'
  Type: 'N'
  Database: 'refseq'
  Full name: ''
  Date: ''
  Usa: 'debug::test.seq'
  Ufo: ''
  Input format: 'gb'
  Output format: 'debug'
  Filename: 'test.seq'
  Entryname: 'NM_031360'
  File name: 'test.seq'
  Extension: 'fasta'
  Single: 'No'
  Features: 'No'
  Count: 'No'
  Documentation:...

    1  atgaagcaca acttttctct gcggctgagg gttttcaacc tcaactgctg    50
   51  ggacatcccc tacctaagca agcatagggc cgaccgcatg aagcgcttgg   100 

       etc.


The extra stuff I'm after is this sort of thing;

LOCUS       NM_031360    1269 bp    mRNA            ROD       12-JUN-2001
DEFINITION  Rattus norvegicus neutral sphingomyelinase (Smpd2), mRNA.
ACCESSION   NM_031360
VERSION     NM_031360.1  GI:14389300
KEYWORDS    .
SOURCE      Norway rat.
  ORGANISM  Rattus norvegicus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
            Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae;
            Rattus.
REFERENCE   1  (sites)
  AUTHORS   Mizutani,Y., Tamiya-Koizumi,K., Irie,F., Hirabayashi,Y., Miwa,M.
            and Yoshida,S.
  TITLE     Cloning and expression of rat neutral sphingomyelinase:
            enzymological characterization and identification of essential
            histidine residues
  JOURNAL   Biochim. Biophys. Acta 1485 (2-3), 236-246 (2000)
  MEDLINE   20292884
COMMENT     PROVISIONAL REFSEQ: This record has not yet been subject to
final
            NCBI review. The reference sequence was derived from AB047002.1.
FEATURES             Location/Qualifiers
     source          1..1269
                     /organism="Rattus norvegicus"
                     /strain="Sprague-Dawley"
                     /db_xref="taxon:10116"
                     /chromosome="X"
                     /chromosome="14"
                     /chromosome="2"
                     /chromosome="3"
                     /chromosome="17"
                     /map="Xq28"
                     /map="14q"
                     /map="2 36.0 cM"
                     /map="Xq11.1"
                     /map="3"
                     /map="17q12-q21"
                     /sex="male"
                     /tissue_type="liver"
                     /clone_lib="rat liver lambda cDNA library
                     (STRATAGENE,#936513)"
     gene            1..1269
                     /gene="Smpd2"
                     /note="EBS3; EBS4; K14; CK; MAGE5; MAGE10; Tdo; Araf"
                     /db_xref="LocusID:83537"
                     /db_xref="MGD:MGI:98246"
                     /db_xref="MIM:148066"
                     /db_xref="MIM:300340"
                     /db_xref="MIM:300343"
                     /db_xref="MIM:601443"
                     /db_xref="RATMAP:36372"
                     /db_xref="RGD:36372"
     CDS             1..1269
                     /gene="Smpd2"
                     /note="lyso-platelet activating factor-phospholipase C;
                     cytokeratin 14; Raf related protein;
                     Synaptosomal-associated protein"
                     /codon_start=1
                     /db_xref="LocusID:83537"
                     /db_xref="MGD:MGI:98246"
                     /db_xref="MIM:148066"
                     /db_xref="MIM:300340"
                     /db_xref="MIM:300343"
                     /db_xref="MIM:601443"
                     /db_xref="RATMAP:36372"
                     /db_xref="RGD:36372"
                     /product="neutral sphingomyelinase"
                     /protein_id="NP_112650.1"
                     /db_xref="GI:14389301"
 
/translation="MKHNFSLRLRVFNLNCWDIPYLSKHRADRMKRLGDFLNLESFDL
 
ALLEEVWSEQDFQYLKQKLSLTYPDAHYFRSGIIGSGLCVFSRHPIQEIVQHVYTLNG
 
YPYKFYHGDWFCGKAVGLLVLHLSGLVLNAYVTHLHAEYSRQKDIYFAHRVAQAWELA
 
QFIHHTSKKANVVLLCGDLNMHPKDLGCCLLKEWTGLRDAFVETEDFKGSEDGCTMVP
 
KNCYVSQQDLGPFPFGVRIDYVLYKAVSGFHICCKTLKTTTGCDPHNGTPFSDHEALM
 
ATLCVKHSPPQEDPCSAHGSAERSALISALREARTELGRGIAQARWWAALFGYVMILG
 
LSLLVLLCVLAAGEEAREVAIMLWTPSVGLVLGAGAVYLFHKQEAKSLCRAQAEIQHV
                     LTRTTETQDLGSEPHPTHCRQQEADRAEEK"
     misc_feature    91..837
                     /note="AP_endonucleas1; Region: AP endonuclease family
1"





More information about the EMBOSS mailing list