[Biojava-l] Uniprot files

Sofia Burvall Sofia.Burvall at hgen.slu.se
Fri Mar 16 16:31:23 UTC 2007


Hi!

I have just started to get to know biojava. I have written a small  
program that reads a file with the help of the biojavax method

  	RichSequence.IOTools.readFile(filen,ns );

and then tries to write the file as UniProt using

	RichSequence.IOTools.writeUniProt(System.out, seqit, ns);

This works nicely when I read a fasta file. But when I try to read a  
Uniprot file I get this error message:

org.biojava.bio.BioException: Could not read sequence
	at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence 
(RichStreamReader.java:113)
	at org.biojavax.bio.seq.io.RichStreamReader.nextSequence 
(RichStreamReader.java:92)
	at org.biojavax.bio.seq.io.RichStreamWriter.writeStream 
(RichStreamWriter.java:66)
	at org.biojavax.bio.seq.RichSequence$IOTools.writeUniProt 
(RichSequence.java:1426)
	at bc_biojava.GeneralReader.main(GeneralReader.java:81)
Caused by: org.biojava.bio.seq.io.ParseException: Bad date line  
found: 01-JAN-1990 (Rel. 13, Created)
	at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence 
(UniProtFormat.java:349)
	at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence 
(RichStreamReader.java:110)
	... 4 more


  When I try other uniprot files i get the same error. It complains  
about "Bad date line..".
What can be the reason for this? Is it the wrong file format?


Cheers
/Sofia

***
Here is the UniProt flat file:
***
ID   FOSB_MOUSE     STANDARD;      PRT;   338 AA.
AC   P13346;
DT   01-JAN-1990 (Rel. 13, Created)
DT   01-JAN-1990 (Rel. 13, Last sequence update)
DT   15-JUN-2002 (Rel. 41, Last annotation update)
DE   Protein fosB.
GN   FOSB.
OS   Mus musculus (Mouse).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
OX   NCBI_Taxid=10090;
RN   [1]
RP   SEQUENCE FROM N.A.
RX   MEDLINE=89251612; PubMed=2498083;
RA   Zerial M., Toschi L., Ryseck R.-P., Schuermann M., Mueller R.,
RA   Bravo R.;
RT   "The product of a novel growth factor activated gene, fos B,  
interacts
RT   with JUN proteins enhancing their DNA binding activity.";
RL   EMBO J. 8:805-813(1989).
RN   [2]
RP   SEQUENCE FROM N.A.
RX   MEDLINE=92158623; PubMed=1741260;
RA   Lazo P.S., Dorfman K., Noguchi T., Mattei M.-G., Bravo R.;
RT   "Structure and mapping of the fosB gene. FosB downregulates the
RT   activity of the fosB promoter.";
RL   Nucleic Acids Res. 20:343-350(1992).
CC   -!- FUNCTION: FOSB INTERACTS WITH JUN PROTEINS ENHANCING THEIR DNA
CC       BINDING ACTIVITY.
CC   -!- SUBUNIT: HETERODIMER (BY SIMILARITY).
CC   -!- SUBCELLULAR LOCATION: NUCLEAR.
CC   -!- INDUCTION: BY GROWTH FACTORS.
CC   -!- SIMILARITY: BELONGS TO THE BZIP FAMILY. FOS SUBFAMILY.
CC    
------------------------------------------------------------------------ 
--
CC   This Swiss-Prot entry is copyright. It is produced through a  
collaboration
CC   between  the Swiss Institute of Bioinformatics  and the  EMBL  
outstation -
CC   the European Bioinformatics Institute.  There are no   
restrictions on  its
CC   use  by  non-profit  institutions as long  as its content  is   
in  no  way
CC   modified and this statement is not removed.  Usage  by  and for  
commercial
CC   entities requires a license agreement (See http://www.isb-sib.ch/ 
announce/
CC   or send an email to license at isb-sib.ch).
CC    
------------------------------------------------------------------------ 
--
DR   EMBL; X14897; CAA33026.1; -.
DR   EMBL; AF093624; AAD13196.1; -.
DR   PIR; S04108; TVMSFB.
DR   PIR; S35477; S35477.
DR   HSSP; P01100; 1FOS.
DR   TRANSFAC; T00291; -.
DR   MGD; MGI:95575; Fosb.
DR   InterPro; IPR000837; Leuzip_Fos.
DR   InterPro; IPR004827; TF_bZIP.
DR   Pfam; PF00170; bZIP; 1.
DR   PRINTS; PR00042; LEUZIPPRFOS.
DR   SMART; SM00338; BRLZ; 1.
DR   PROSITE; PS00036; BZIP_BASIC; 1.
KW   Nuclear protein; DNA-binding.
FT   DNA_BIND    161    179       BASIC MOTIF.
FT   DOMAIN      183    211       LEUCINE-ZIPPER.
SQ   SEQUENCE   338 AA;  35976 MW;  E9D031A4BEAE48EC CRC64;
      MFQAFPGDYD SGSRCSSSPS AESQYLSSVD SFGSPPTAAA SQECAGLGEM PGSFVPTVTA
      ITTSQDLQWL VQPTLISSMA QSQGQPLASQ PPAVDPYDMP GTSYSTPGLS AYSTGGASGS
      GGPSTSTTTS GPVSARPARA RPRRPREETL TPEEEEKRRV RRERNKLAAA KCRNRRRELT
      DRLQAETDQL EEEKAELESE IAELQKEKER LEFVLVAHKP GCKIPYEEGP GPGPLAEVRD
      LPGSTSAKED GFGWLLPPPP PPPLPFQSSR DAPPNLTASL FTHSEVQVLG DPFPVVSPSY
      TSSFVLTCPE VSAFAGAQRT SGSEQPSDPL NSPSLLAL
//






More information about the Biojava-l mailing list