[Biojava-l] RefSeq bioJava parser problem

wanner.de@pg.com wanner.de@pg.com
Tue, 14 May 2002 11:40:33 -0400


Hi,

Appreciate the responses to the refSeq question. We've been able to put togther
a reliable parser using the example in TestRefSeqPrt.

Have an additional question now.   Are there any utility methods within bioJava
that can be used to handle parsed values that are returned by bioJava in list
form.

For example the following value was returned from bioJava for a sequence
annotation with key MEDLINE:

     [98127055, 99357812]


Another example is the value that was returned from bioJava for a feature annotation with key  db_xref:

     [LocusID:946, MIM:604405]

bioJava does good work in accumulating the information together and placing it under a specific annotation, does
anyone know if there are method to extract listMembers or parameter/value pairs already available in bioJava?

thx,
Dave


With a catch, the genbank reader is the right thing to use.  The issue is
that the Genbank parser only reads nucleotide sequences, and you've got an
amino acid sequence here.  So, the biojava sequence is being built with the
wrong alphabet and breaks when you hit the sequence.  TestRefSeqPrt will
handle files like these.  The sample code should point you in the right
direction.

Greg


> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Thursday, May 09, 2002 8:58 AM
> To: wanner.de@pg.com
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] RefSeq bioJava parser problem
>
>
> Hi,
>
> The genbank parser may be being fairly paranoid about the
> exact format.
> I don't think that these parsers were writen in a modular
> manner, so it
> may not be easy to plug together your own customized version without
> access to the source code. The source code can be obtained from
> anonymous CVS, or as a download from the biojava web site.
> Alternatively, the org.biojava.bio.program.tagvalue package
> provides an
> alternative and modularly extensible, but poorly documented API for
> processing tag-value files such as these. You may be able to
> knock up a
> complete parser in a couple of hours that way, depending on what you
> want to turn these files into.
>
> Could someone with genbank parsing experteese say what the
> differences
> between the two formats are, and how easy it would be to get
> the genbank
> parser to accept refseq documents?
>
> Matthew
>
> wanner.de@pg.com wrote:
> > bioJava members,
> >
> > Our Genbank parser using bioJava has been working great.
> We've now been asked
> > to parse RefSeq accession numbers..... which seem to have
> only minor differences
> > in the Genbank format,  however,  bioJava cannot read the
> sequence.   I get the
> > " org.biojava.bio.BioException: Could not read sequence"
> exception.   Below is
> > the sequence I am trying to parse (downloaded from NCBI):
> Do you have any
> > ideas ?  Sould we be
> > using something other than a Genbank reader to be parsing this?
> >
> > Thanks Much,
> >
> > LOCUS       NP_000221                167 aa
> linear   PRI 29-JAN-2002
> > DEFINITION  leptin precursor; leptin (murine obesity
> homolog); obesity; obesity
> >             (murine homolog, leptin) [Homo sapiens].
> > ACCESSION   NP_000221
> > PID         g4557715
> > VERSION     NP_000221.1  GI:4557715
> > DBSOURCE    REFSEQ: accession NM_000230.1
> > KEYWORDS    .
> > SOURCE      human.
> >   ORGANISM  Homo sapiens
> >             Eukaryota; Metazoa; Chordata; Craniata;
> Vertebrata; Euteleostomi;
> >             Mammalia; Eutheria; Primates; Catarrhini;
> Hominidae; Homo.
> > REFERENCE   1  (residues 1 to 167)
> >   AUTHORS   Friedman JM, Leibel RL, Siegel DS, Walsh J and Bahary N.
> >   TITLE     Molecular mapping of the mouse ob mutation
> >   JOURNAL   Genomics 11 (4), 1054-1062 (1991)
> >   MEDLINE   92147101
> >    PUBMED   1686014
> > REFERENCE   2  (residues 1 to 167)
> >   AUTHORS   Zhang Y, Proenca R, Maffei M, Barone M, Leopold
> L and Friedman JM.
> >   TITLE     Positional cloning of the mouse obese gene and
> its human homologue
> >   JOURNAL   Nature 372 (6505), 425-432 (1994)
> >   MEDLINE   95075453
> >    PUBMED   7984236
> >   REMARK    Erratum:[[published erratum appears in Nature 1995 Mar
> >             30;374(6521):479]]
> > REFERENCE   3  (residues 1 to 167)
> >   AUTHORS   Masuzaki H, Ogawa Y, Isse N, Satoh N, Okazaki
> T, Shigemoto M, Mori
> >             K, Tamura N, Hosoda K, Yoshimasa Y et al.
> >   TITLE     Human obese gene expression. Adipocyte-specific
> expression and
> >             regional differences in the adipose tissue
> >   JOURNAL   Diabetes 44 (7), 855-858 (1995)
> >   MEDLINE   95309556
> >    PUBMED   7789654
> > REFERENCE   4  (residues 1 to 167)
> >   AUTHORS   Green ED, Maffei M, Braden VV, Proenca R,
> DeSilva U, Zhang Y, Chua
> >             SC Jr, Leibel RL, Weissenbach J and Friedman JM.
> >   TITLE     The human obese (OB) gene: RNA expression
> pattern and mapping on
> >             the physical, cytogenetic, and genetic maps of
> chromosome 7
> >   JOURNAL   Genome Res. 5 (1), 5-12 (1995)
> >   MEDLINE   96352898
> >    PUBMED   8717050
> > REFERENCE   5  (residues 1 to 167)
> >   AUTHORS   Isse N, Ogawa Y, Tamura N, Masuzaki H, Mori K,
> Okazaki T, Satoh N,
> >             Shigemoto M, Yoshimasa Y, Nishi S et al.
> >   TITLE     Structural organization and chromosomal
> assignment of the human
> >             obese gene
> >   JOURNAL   J. Biol. Chem. 270 (46), 27728-27733 (1995)
> >   MEDLINE   96070903
> >    PUBMED   7499240
> > REFERENCE   6  (residues 1 to 167)
> >   AUTHORS   Gong,D.W., Bi,S., Pratley,R.E. and Weintraub,B.D.
> >   TITLE     Genomic structure and promoter analysis of the
> human obese gene
> >   JOURNAL   J. Biol. Chem. 271 (8), 3971-3974 (1996)
> >   MEDLINE   96223958
> > REFERENCE   7  (residues 1 to 167)
> >   AUTHORS   Niki T, Mori H, Tamori Y, Kishimoto-Hashirmoto
> M, Ueno H, Araki S,
> >             Masugi J, Sawant N, Majithia HR, Rais N et al.
> >   TITLE     Human obese gene: molecular screening in
> Japanese and Asian Indian
> >             NIDDM patients associated with obesity
> >   JOURNAL   Diabetes 45 (5), 675-678 (1996)
> >   MEDLINE   96198511
> >    PUBMED   8621021
> > REFERENCE   8  (residues 1 to 167)
> >   AUTHORS   Comuzzie,A.G., Hixson,J.E., Almasy,L.,
> Mitchell,B.D., Mahaney,M.C.,
> >             Dyer,T.D., Stern,M.P., MacCluer,J.W. and Blangero,J.
> >   TITLE     A major quantitative trait locus determining
> serum leptin levels
> >             and fat mass is located on human chromosome 2
> >   JOURNAL   Nat. Genet. 15 (3), 273-276 (1997)
> >   MEDLINE   97207647
> >    PUBMED   9054940
> > REFERENCE   9  (residues 1 to 167)
> >   AUTHORS   Clement,K., Vaisse,C., Lahlou,N., Cabrol,S., Pelloux,V.,
> >             Cassuto,D., Gourmelen,M., Dina,C., Chambaz,J.,
> Lacorte,J.M.,
> >             Basdevant,A., Bougneres,P., Lebouc,Y.,
> Froguel,P. and Guy-Grand,B.
> >   TITLE     A mutation in the human leptin receptor gene
> causes obesity and
> >             pituitary dysfunction
> >   JOURNAL   Nature 392 (6674), 398-401 (1998)
> >   MEDLINE   98196670
> >    PUBMED   9537324
> > REFERENCE   10 (residues 1 to 167)
> >   AUTHORS   Friedman,J.M. and Halaas,J.L.
> >   TITLE     Leptin and the regulation of body weight in mammals
> >   JOURNAL   Nature 395 (6704), 763-770 (1998)
> >   MEDLINE   99010835
> > COMMENT     REVIEWED REFSEQ: This record has been curated
> by NCBI staff. The
> >             reference sequence was derived from U43653.1.
> >             Summary: This gene is similar to the mouse
> obesity gene (ob). The
> >             protein encoded by this gene is secreted by
> white adipocytes. In
> >             the mouse study, mutations in this gene are
> linked to severe and
> >             morbid obesity.
> > FEATURES             Location/Qualifiers
> >      source          1..167
> >                      /organism="Homo sapiens"
> >                      /db_xref="taxon:9606"
> >                      /chromosome="7"
> >                      /map="7q31.3"
> >      Protein         1..167
> >                      /product="leptin precursor"
> >                      /note="leptin (murine obesity
> homolog); obesity (murine
> >                      homolog, leptin)"
> >      sig_peptide     1..21
> >      Region          22..167
> >                      /region_name="Leptin"
> >                      /note="Leptin"
> >                      /db_xref="CDD:pfam02024"
> >      mat_peptide     22..167
> >                      /product="leptin"
> >      CDS             1..167
> >                      /gene="LEP"
> >                      /coded_by="NM_000230.1:57..560"
> >                      /db_xref="LocusID:3952"
> >                      /db_xref="MIM:164160"
> > ORIGIN
> >         1 mhwgtlcgfl wlwpylfyvq avpiqkvqdd tktliktivt
> rindishtqs vsskqkvtgl
> >        61 dfipglhpil tlskmdqtla vyqqiltsmp srnviqisnd
> lenlrdllhv lafskschlp
> >       121 wasgletlds lggvleasgy stevvalsrl qgslqdmlwq ldlspgc
> > //
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l