[Biojava-l] RefSeq bioJava parser problem

Matthew Pocock matthew_pocock@yahoo.co.uk
Thu, 09 May 2002 13:57:53 +0100


Hi,

The genbank parser may be being fairly paranoid about the exact format. 
I don't think that these parsers were writen in a modular manner, so it 
may not be easy to plug together your own customized version without 
access to the source code. The source code can be obtained from 
anonymous CVS, or as a download from the biojava web site. 
Alternatively, the org.biojava.bio.program.tagvalue package provides an 
alternative and modularly extensible, but poorly documented API for 
processing tag-value files such as these. You may be able to knock up a 
complete parser in a couple of hours that way, depending on what you 
want to turn these files into.

Could someone with genbank parsing experteese say what the differences 
between the two formats are, and how easy it would be to get the genbank 
parser to accept refseq documents?

Matthew

wanner.de@pg.com wrote:
> bioJava members,
> 
> Our Genbank parser using bioJava has been working great.   We've now been asked
> to parse RefSeq accession numbers..... which seem to have only minor differences
> in the Genbank format,  however,  bioJava cannot read the sequence.   I get the
> " org.biojava.bio.BioException: Could not read sequence"   exception.   Below is
> the sequence I am trying to parse (downloaded from NCBI):   Do you have any
> ideas ?  Sould we be
> using something other than a Genbank reader to be parsing this?
> 
> Thanks Much,
> 
> LOCUS       NP_000221                167 aa            linear   PRI 29-JAN-2002
> DEFINITION  leptin precursor; leptin (murine obesity homolog); obesity; obesity
>             (murine homolog, leptin) [Homo sapiens].
> ACCESSION   NP_000221
> PID         g4557715
> VERSION     NP_000221.1  GI:4557715
> DBSOURCE    REFSEQ: accession NM_000230.1
> KEYWORDS    .
> SOURCE      human.
>   ORGANISM  Homo sapiens
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
>             Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
> REFERENCE   1  (residues 1 to 167)
>   AUTHORS   Friedman JM, Leibel RL, Siegel DS, Walsh J and Bahary N.
>   TITLE     Molecular mapping of the mouse ob mutation
>   JOURNAL   Genomics 11 (4), 1054-1062 (1991)
>   MEDLINE   92147101
>    PUBMED   1686014
> REFERENCE   2  (residues 1 to 167)
>   AUTHORS   Zhang Y, Proenca R, Maffei M, Barone M, Leopold L and Friedman JM.
>   TITLE     Positional cloning of the mouse obese gene and its human homologue
>   JOURNAL   Nature 372 (6505), 425-432 (1994)
>   MEDLINE   95075453
>    PUBMED   7984236
>   REMARK    Erratum:[[published erratum appears in Nature 1995 Mar
>             30;374(6521):479]]
> REFERENCE   3  (residues 1 to 167)
>   AUTHORS   Masuzaki H, Ogawa Y, Isse N, Satoh N, Okazaki T, Shigemoto M, Mori
>             K, Tamura N, Hosoda K, Yoshimasa Y et al.
>   TITLE     Human obese gene expression. Adipocyte-specific expression and
>             regional differences in the adipose tissue
>   JOURNAL   Diabetes 44 (7), 855-858 (1995)
>   MEDLINE   95309556
>    PUBMED   7789654
> REFERENCE   4  (residues 1 to 167)
>   AUTHORS   Green ED, Maffei M, Braden VV, Proenca R, DeSilva U, Zhang Y, Chua
>             SC Jr, Leibel RL, Weissenbach J and Friedman JM.
>   TITLE     The human obese (OB) gene: RNA expression pattern and mapping on
>             the physical, cytogenetic, and genetic maps of chromosome 7
>   JOURNAL   Genome Res. 5 (1), 5-12 (1995)
>   MEDLINE   96352898
>    PUBMED   8717050
> REFERENCE   5  (residues 1 to 167)
>   AUTHORS   Isse N, Ogawa Y, Tamura N, Masuzaki H, Mori K, Okazaki T, Satoh N,
>             Shigemoto M, Yoshimasa Y, Nishi S et al.
>   TITLE     Structural organization and chromosomal assignment of the human
>             obese gene
>   JOURNAL   J. Biol. Chem. 270 (46), 27728-27733 (1995)
>   MEDLINE   96070903
>    PUBMED   7499240
> REFERENCE   6  (residues 1 to 167)
>   AUTHORS   Gong,D.W., Bi,S., Pratley,R.E. and Weintraub,B.D.
>   TITLE     Genomic structure and promoter analysis of the human obese gene
>   JOURNAL   J. Biol. Chem. 271 (8), 3971-3974 (1996)
>   MEDLINE   96223958
> REFERENCE   7  (residues 1 to 167)
>   AUTHORS   Niki T, Mori H, Tamori Y, Kishimoto-Hashirmoto M, Ueno H, Araki S,
>             Masugi J, Sawant N, Majithia HR, Rais N et al.
>   TITLE     Human obese gene: molecular screening in Japanese and Asian Indian
>             NIDDM patients associated with obesity
>   JOURNAL   Diabetes 45 (5), 675-678 (1996)
>   MEDLINE   96198511
>    PUBMED   8621021
> REFERENCE   8  (residues 1 to 167)
>   AUTHORS   Comuzzie,A.G., Hixson,J.E., Almasy,L., Mitchell,B.D., Mahaney,M.C.,
>             Dyer,T.D., Stern,M.P., MacCluer,J.W. and Blangero,J.
>   TITLE     A major quantitative trait locus determining serum leptin levels
>             and fat mass is located on human chromosome 2
>   JOURNAL   Nat. Genet. 15 (3), 273-276 (1997)
>   MEDLINE   97207647
>    PUBMED   9054940
> REFERENCE   9  (residues 1 to 167)
>   AUTHORS   Clement,K., Vaisse,C., Lahlou,N., Cabrol,S., Pelloux,V.,
>             Cassuto,D., Gourmelen,M., Dina,C., Chambaz,J., Lacorte,J.M.,
>             Basdevant,A., Bougneres,P., Lebouc,Y., Froguel,P. and Guy-Grand,B.
>   TITLE     A mutation in the human leptin receptor gene causes obesity and
>             pituitary dysfunction
>   JOURNAL   Nature 392 (6674), 398-401 (1998)
>   MEDLINE   98196670
>    PUBMED   9537324
> REFERENCE   10 (residues 1 to 167)
>   AUTHORS   Friedman,J.M. and Halaas,J.L.
>   TITLE     Leptin and the regulation of body weight in mammals
>   JOURNAL   Nature 395 (6704), 763-770 (1998)
>   MEDLINE   99010835
> COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
>             reference sequence was derived from U43653.1.
>             Summary: This gene is similar to the mouse obesity gene (ob). The
>             protein encoded by this gene is secreted by white adipocytes. In
>             the mouse study, mutations in this gene are linked to severe and
>             morbid obesity.
> FEATURES             Location/Qualifiers
>      source          1..167
>                      /organism="Homo sapiens"
>                      /db_xref="taxon:9606"
>                      /chromosome="7"
>                      /map="7q31.3"
>      Protein         1..167
>                      /product="leptin precursor"
>                      /note="leptin (murine obesity homolog); obesity (murine
>                      homolog, leptin)"
>      sig_peptide     1..21
>      Region          22..167
>                      /region_name="Leptin"
>                      /note="Leptin"
>                      /db_xref="CDD:pfam02024"
>      mat_peptide     22..167
>                      /product="leptin"
>      CDS             1..167
>                      /gene="LEP"
>                      /coded_by="NM_000230.1:57..560"
>                      /db_xref="LocusID:3952"
>                      /db_xref="MIM:164160"
> ORIGIN
>         1 mhwgtlcgfl wlwpylfyvq avpiqkvqdd tktliktivt rindishtqs vsskqkvtgl
>        61 dfipglhpil tlskmdqtla vyqqiltsmp srnviqisnd lenlrdllhv lafskschlp
>       121 wasgletlds lggvleasgy stevvalsrl qgslqdmlwq ldlspgc
> //
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>