[Biojava-l] UniprotParser

Saif Ur-Rehman su24 at st-andrews.ac.uk
Mon Sep 19 10:09:46 UTC 2011


Dear all,

I am having issues with the BioJava UniProt parser as detailed below:

Code:

BufferedReader br = new BufferedReader(new FileReader( files[index]));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator iterator = RichSequence.IOTools.readUniProt(br, ns);
while(iterator.hasNext())
{
try
               {
RichSequence rs=iterator.nextRichSequence();
}

              catch (NoSuchElementException e)
               {

}
               catch (BioException e)
               {
             e.printStackTrace();
}




The file I am using is downloaded from the link:

ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_fungi.dat.gz


The problem is that the parser works for a subset of the IDs within the file
and on others throws an exception.

Sample Exception stack trace:

 *** Start of trace *************************

at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at uniprot.mp.main(mp.java:161)
Caused by: org.biojava.bio.seq.io.ParseException:

A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post a bug
report to http://bugzilla.open-bio.org/

Format_object=org.biojavax.bio.seq.io.UniProtFormat
Accession=P53031
Id=
Comments=
Parse_block=RN   [1]RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].RC   STRAIN=NCYC
2512;RX   MEDLINE=97082501; PubMed=8923737;
DOI=10.1002/(SICI)1097-0061(199610)12:13<1321::AID-YEA27>3.0.CO;2-6;RA
Rodriguez P.L., Ali R., Serrano R.;RT   "CtCdc55p and CtHa13p: two putative
regulatory proteins from Candida
tropicalis with long acidic domains.";RL   Yeast 12:1321-1329(1996).
Stack trace follows ....


at
org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:615)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 1 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.biojavax.bio.seq.io.UniProtFormat.readRichSequence(UniProtFormat.java:486)
... 2 more
org.biojava.bio.BioException: Could not read sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at uniprot.mp.main(mp.java:161)
Caused by: org.biojava.bio.seq.io.ParseException: Name has not been supplied

********End of trace**********************************

An example of an Id that worked is:

ZYM1_SCHPO

while an ID that didn't work is:

ZUO1_YEAST

Thanks a lot in advance.

Cheers,
Saif


-- 
Saif Ur-Rehman

Centre for Evolution, Genes and Genomics
Harold Mitchell Building
University of St Andrews
St Andrews
Fife
KY16 9TH
UK

Tel: +44 131 5572556
Fax: +44 1334 463366



More information about the Biojava-l mailing list