[Biojava-l] Uniprot files
Sofia Burvall
Sofia.Burvall at hgen.slu.se
Fri Mar 16 16:31:23 UTC 2007
Hi!
I have just started to get to know biojava. I have written a small
program that reads a file with the help of the biojavax method
RichSequence.IOTools.readFile(filen,ns );
and then tries to write the file as UniProt using
RichSequence.IOTools.writeUniProt(System.out, seqit, ns);
This works nicely when I read a fasta file. But when I try to read a
Uniprot file I get this error message:
org.biojava.bio.BioException: Could not read sequence
at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence
(RichStreamReader.java:113)
at org.biojavax.bio.seq.io.RichStreamReader.nextSequence
(RichStreamReader.java:92)
at org.biojavax.bio.seq.io.RichStreamWriter.writeStream
(RichStreamWriter.java:66)
at org.biojavax.bio.seq.RichSequence$IOTools.writeUniProt
(RichSequence.java:1426)
at bc_biojava.GeneralReader.main(GeneralReader.java:81)
Caused by: org.biojava.bio.seq.io.ParseException: Bad date line
found: 01-JAN-1990 (Rel. 13, Created)
at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence
(UniProtFormat.java:349)
at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence
(RichStreamReader.java:110)
... 4 more
When I try other uniprot files i get the same error. It complains
about "Bad date line..".
What can be the reason for this? Is it the wrong file format?
Cheers
/Sofia
***
Here is the UniProt flat file:
***
ID FOSB_MOUSE STANDARD; PRT; 338 AA.
AC P13346;
DT 01-JAN-1990 (Rel. 13, Created)
DT 01-JAN-1990 (Rel. 13, Last sequence update)
DT 15-JUN-2002 (Rel. 41, Last annotation update)
DE Protein fosB.
GN FOSB.
OS Mus musculus (Mouse).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
OX NCBI_Taxid=10090;
RN [1]
RP SEQUENCE FROM N.A.
RX MEDLINE=89251612; PubMed=2498083;
RA Zerial M., Toschi L., Ryseck R.-P., Schuermann M., Mueller R.,
RA Bravo R.;
RT "The product of a novel growth factor activated gene, fos B,
interacts
RT with JUN proteins enhancing their DNA binding activity.";
RL EMBO J. 8:805-813(1989).
RN [2]
RP SEQUENCE FROM N.A.
RX MEDLINE=92158623; PubMed=1741260;
RA Lazo P.S., Dorfman K., Noguchi T., Mattei M.-G., Bravo R.;
RT "Structure and mapping of the fosB gene. FosB downregulates the
RT activity of the fosB promoter.";
RL Nucleic Acids Res. 20:343-350(1992).
CC -!- FUNCTION: FOSB INTERACTS WITH JUN PROTEINS ENHANCING THEIR DNA
CC BINDING ACTIVITY.
CC -!- SUBUNIT: HETERODIMER (BY SIMILARITY).
CC -!- SUBCELLULAR LOCATION: NUCLEAR.
CC -!- INDUCTION: BY GROWTH FACTORS.
CC -!- SIMILARITY: BELONGS TO THE BZIP FAMILY. FOS SUBFAMILY.
CC
------------------------------------------------------------------------
--
CC This Swiss-Prot entry is copyright. It is produced through a
collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL
outstation -
CC the European Bioinformatics Institute. There are no
restrictions on its
CC use by non-profit institutions as long as its content is
in no way
CC modified and this statement is not removed. Usage by and for
commercial
CC entities requires a license agreement (See http://www.isb-sib.ch/
announce/
CC or send an email to license at isb-sib.ch).
CC
------------------------------------------------------------------------
--
DR EMBL; X14897; CAA33026.1; -.
DR EMBL; AF093624; AAD13196.1; -.
DR PIR; S04108; TVMSFB.
DR PIR; S35477; S35477.
DR HSSP; P01100; 1FOS.
DR TRANSFAC; T00291; -.
DR MGD; MGI:95575; Fosb.
DR InterPro; IPR000837; Leuzip_Fos.
DR InterPro; IPR004827; TF_bZIP.
DR Pfam; PF00170; bZIP; 1.
DR PRINTS; PR00042; LEUZIPPRFOS.
DR SMART; SM00338; BRLZ; 1.
DR PROSITE; PS00036; BZIP_BASIC; 1.
KW Nuclear protein; DNA-binding.
FT DNA_BIND 161 179 BASIC MOTIF.
FT DOMAIN 183 211 LEUCINE-ZIPPER.
SQ SEQUENCE 338 AA; 35976 MW; E9D031A4BEAE48EC CRC64;
MFQAFPGDYD SGSRCSSSPS AESQYLSSVD SFGSPPTAAA SQECAGLGEM PGSFVPTVTA
ITTSQDLQWL VQPTLISSMA QSQGQPLASQ PPAVDPYDMP GTSYSTPGLS AYSTGGASGS
GGPSTSTTTS GPVSARPARA RPRRPREETL TPEEEEKRRV RRERNKLAAA KCRNRRRELT
DRLQAETDQL EEEKAELESE IAELQKEKER LEFVLVAHKP GCKIPYEEGP GPGPLAEVRD
LPGSTSAKED GFGWLLPPPP PPPLPFQSSR DAPPNLTASL FTHSEVQVLG DPFPVVSPSY
TSSFVLTCPE VSAFAGAQRT SGSEQPSDPL NSPSLLAL
//
More information about the Biojava-l
mailing list