[Biojava-l] RefSeq bioJava parser problem
Matthew Pocock
matthew_pocock@yahoo.co.uk
Thu, 09 May 2002 13:57:53 +0100
Hi,
The genbank parser may be being fairly paranoid about the exact format.
I don't think that these parsers were writen in a modular manner, so it
may not be easy to plug together your own customized version without
access to the source code. The source code can be obtained from
anonymous CVS, or as a download from the biojava web site.
Alternatively, the org.biojava.bio.program.tagvalue package provides an
alternative and modularly extensible, but poorly documented API for
processing tag-value files such as these. You may be able to knock up a
complete parser in a couple of hours that way, depending on what you
want to turn these files into.
Could someone with genbank parsing experteese say what the differences
between the two formats are, and how easy it would be to get the genbank
parser to accept refseq documents?
Matthew
wanner.de@pg.com wrote:
> bioJava members,
>
> Our Genbank parser using bioJava has been working great. We've now been asked
> to parse RefSeq accession numbers..... which seem to have only minor differences
> in the Genbank format, however, bioJava cannot read the sequence. I get the
> " org.biojava.bio.BioException: Could not read sequence" exception. Below is
> the sequence I am trying to parse (downloaded from NCBI): Do you have any
> ideas ? Sould we be
> using something other than a Genbank reader to be parsing this?
>
> Thanks Much,
>
> LOCUS NP_000221 167 aa linear PRI 29-JAN-2002
> DEFINITION leptin precursor; leptin (murine obesity homolog); obesity; obesity
> (murine homolog, leptin) [Homo sapiens].
> ACCESSION NP_000221
> PID g4557715
> VERSION NP_000221.1 GI:4557715
> DBSOURCE REFSEQ: accession NM_000230.1
> KEYWORDS .
> SOURCE human.
> ORGANISM Homo sapiens
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
> REFERENCE 1 (residues 1 to 167)
> AUTHORS Friedman JM, Leibel RL, Siegel DS, Walsh J and Bahary N.
> TITLE Molecular mapping of the mouse ob mutation
> JOURNAL Genomics 11 (4), 1054-1062 (1991)
> MEDLINE 92147101
> PUBMED 1686014
> REFERENCE 2 (residues 1 to 167)
> AUTHORS Zhang Y, Proenca R, Maffei M, Barone M, Leopold L and Friedman JM.
> TITLE Positional cloning of the mouse obese gene and its human homologue
> JOURNAL Nature 372 (6505), 425-432 (1994)
> MEDLINE 95075453
> PUBMED 7984236
> REMARK Erratum:[[published erratum appears in Nature 1995 Mar
> 30;374(6521):479]]
> REFERENCE 3 (residues 1 to 167)
> AUTHORS Masuzaki H, Ogawa Y, Isse N, Satoh N, Okazaki T, Shigemoto M, Mori
> K, Tamura N, Hosoda K, Yoshimasa Y et al.
> TITLE Human obese gene expression. Adipocyte-specific expression and
> regional differences in the adipose tissue
> JOURNAL Diabetes 44 (7), 855-858 (1995)
> MEDLINE 95309556
> PUBMED 7789654
> REFERENCE 4 (residues 1 to 167)
> AUTHORS Green ED, Maffei M, Braden VV, Proenca R, DeSilva U, Zhang Y, Chua
> SC Jr, Leibel RL, Weissenbach J and Friedman JM.
> TITLE The human obese (OB) gene: RNA expression pattern and mapping on
> the physical, cytogenetic, and genetic maps of chromosome 7
> JOURNAL Genome Res. 5 (1), 5-12 (1995)
> MEDLINE 96352898
> PUBMED 8717050
> REFERENCE 5 (residues 1 to 167)
> AUTHORS Isse N, Ogawa Y, Tamura N, Masuzaki H, Mori K, Okazaki T, Satoh N,
> Shigemoto M, Yoshimasa Y, Nishi S et al.
> TITLE Structural organization and chromosomal assignment of the human
> obese gene
> JOURNAL J. Biol. Chem. 270 (46), 27728-27733 (1995)
> MEDLINE 96070903
> PUBMED 7499240
> REFERENCE 6 (residues 1 to 167)
> AUTHORS Gong,D.W., Bi,S., Pratley,R.E. and Weintraub,B.D.
> TITLE Genomic structure and promoter analysis of the human obese gene
> JOURNAL J. Biol. Chem. 271 (8), 3971-3974 (1996)
> MEDLINE 96223958
> REFERENCE 7 (residues 1 to 167)
> AUTHORS Niki T, Mori H, Tamori Y, Kishimoto-Hashirmoto M, Ueno H, Araki S,
> Masugi J, Sawant N, Majithia HR, Rais N et al.
> TITLE Human obese gene: molecular screening in Japanese and Asian Indian
> NIDDM patients associated with obesity
> JOURNAL Diabetes 45 (5), 675-678 (1996)
> MEDLINE 96198511
> PUBMED 8621021
> REFERENCE 8 (residues 1 to 167)
> AUTHORS Comuzzie,A.G., Hixson,J.E., Almasy,L., Mitchell,B.D., Mahaney,M.C.,
> Dyer,T.D., Stern,M.P., MacCluer,J.W. and Blangero,J.
> TITLE A major quantitative trait locus determining serum leptin levels
> and fat mass is located on human chromosome 2
> JOURNAL Nat. Genet. 15 (3), 273-276 (1997)
> MEDLINE 97207647
> PUBMED 9054940
> REFERENCE 9 (residues 1 to 167)
> AUTHORS Clement,K., Vaisse,C., Lahlou,N., Cabrol,S., Pelloux,V.,
> Cassuto,D., Gourmelen,M., Dina,C., Chambaz,J., Lacorte,J.M.,
> Basdevant,A., Bougneres,P., Lebouc,Y., Froguel,P. and Guy-Grand,B.
> TITLE A mutation in the human leptin receptor gene causes obesity and
> pituitary dysfunction
> JOURNAL Nature 392 (6674), 398-401 (1998)
> MEDLINE 98196670
> PUBMED 9537324
> REFERENCE 10 (residues 1 to 167)
> AUTHORS Friedman,J.M. and Halaas,J.L.
> TITLE Leptin and the regulation of body weight in mammals
> JOURNAL Nature 395 (6704), 763-770 (1998)
> MEDLINE 99010835
> COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
> reference sequence was derived from U43653.1.
> Summary: This gene is similar to the mouse obesity gene (ob). The
> protein encoded by this gene is secreted by white adipocytes. In
> the mouse study, mutations in this gene are linked to severe and
> morbid obesity.
> FEATURES Location/Qualifiers
> source 1..167
> /organism="Homo sapiens"
> /db_xref="taxon:9606"
> /chromosome="7"
> /map="7q31.3"
> Protein 1..167
> /product="leptin precursor"
> /note="leptin (murine obesity homolog); obesity (murine
> homolog, leptin)"
> sig_peptide 1..21
> Region 22..167
> /region_name="Leptin"
> /note="Leptin"
> /db_xref="CDD:pfam02024"
> mat_peptide 22..167
> /product="leptin"
> CDS 1..167
> /gene="LEP"
> /coded_by="NM_000230.1:57..560"
> /db_xref="LocusID:3952"
> /db_xref="MIM:164160"
> ORIGIN
> 1 mhwgtlcgfl wlwpylfyvq avpiqkvqdd tktliktivt rindishtqs vsskqkvtgl
> 61 dfipglhpil tlskmdqtla vyqqiltsmp srnviqisnd lenlrdllhv lafskschlp
> 121 wasgletlds lggvleasgy stevvalsrl qgslqdmlwq ldlspgc
> //
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>