[Biojava-dev] Error parsing XEMBL

Sicotte, Hugues (NIH/NCI) sicotteh at mail.nih.gov
Mon Aug 11 11:15:52 EDT 2003


You're not going to like my answer.

The only easy fix to that is to have the parser
have a more explicit error message.. or another
error category.

The feature parser does not know how to handle
segmented sets. (e.g. when the annotation refers
to sequence outside the current one).
Segmented sets are an horror that has plagued bioinformatics
for years.


   mRNA            join(<1..197,Y13288.1:41..148,Y13289.1:41..140,
                     Y13290.1:41..175,Y13291.1:41..239,Y13292.1:41..172,
                     Y13293.1:41..140,Y13294.1:41..212,Y13295.1:41..185,
                     Y13296.1:41..95,Y13297.1:41..>971)

There is no easy fix to this for a parser, (e.g. how is the
parser supposed to figure out where the missing files are)
 but there should be some logic in the parser to reject such features.. and
exit
more elegantly.

------------------------------------------
Here is the long-term NCBI-type solution; To deal with that at NCBI we
had something called the seg-set. I wrote network code that would package
all such records together in a single ASN.1 file (could do the same
thing in XML) and then the sequence information would be "local" to the
feature
parser. 


Hugues


-----Original Message-----
From: Carlisia P. Campos [mailto:carlisia at bu.edu]
Sent: Monday, August 11, 2003 8:47 AM
To: biojava-dev at biojava.org
Subject: [Biojava-dev] Error parsing XEMBL


Hello there,

I have encountered what seems to be a problem with biojava. I adapted a  
piece of code offered by Mark Schreiber to parse the XEMBL xml string  
that corresponds to the accession number "Y13287" and got the error  
below. The code also goes below. If anyone knows of a work around  
please let me know.

Best,
--Carlisia



   public static String convert(String agaveString) {
     String seq_S = "";

     try {
       AGAVEHandler handler = new AGAVEHandler();

       SeqIOListener siol = new SeqIOAdapter();
       handler.setFeatureListener(siol);

       SAX2StAXAdaptor adaptor = new SAX2StAXAdaptor(handler);
       //XMLReader xmlReader =  
SAXParserFactory.newInstance().newSAXParser();
       SAXParser saxParser =  
SAXParserFactory.newInstance().newSAXParser();
       XMLReader xmlReader = saxParser.getXMLReader();
      // XMLReader xmlReader = new SAXParser();
       //FileReader fr = new FileReader(args[0]);
       InputSource is = new InputSource(new StringReader(agaveString));

       xmlReader.setContentHandler(adaptor);
       xmlReader.parse(is);

       for(Iterator i = handler.getSequences();i.hasNext();){
         Sequence s = (Sequence)i.next();
         seq_S = s.seqString();
       }
     }
     catch(SAXException se) {
       se.printStackTrace();
     }
     catch(ParserConfigurationException pce) {
       pce.printStackTrace();
     }
     catch(IOException ioe) {
       ioe.printStackTrace();
     }
     return seq_S;
   }



This happens once the program reaches the line that contains:  
xmlReader.parse(is):

java.lang.IllegalArgumentException: Location [41,239] is outside 1..237
	at  
org.biojava.bio.seq.impl.SimpleFeature.<init>(SimpleFeature.java:306)
	at  
org.biojava.bio.seq.impl.SimpleStrandedFeature.<init>(SimpleStrandedFeat 
ure.java:74)
	at java.lang.reflect.Constructor.newInstance(Native Method)
	at  
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFea 
tureRealizer.java:138)
rethrown as org.biojava.bio.BioException: Couldn't realize feature
	at  
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFea 
tureRealizer.java:144)
	at  
org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatureRe 
alizer.java:94)
	at  
org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence.ja 
va:198)
	at  
org.biojava.bio.seq.impl.SimpleFeature.realizeFeature(SimpleFeature.java 
:328)
	at  
org.biojava.bio.seq.impl.SimpleFeature.createFeature(SimpleFeature.java: 
337)
	at  
org.biojava.bio.seq.io.agave.StAXFeatureHandler.realizeSubFeatures(StAXF 
eatureHandler.java:349)
	at  
org.biojava.bio.seq.io.agave.StAXFeatureHandler.addFeatureToSequence(StA 
XFeatureHandler.java:377)
	at  
org.biojava.bio.seq.io.agave.AGAVEBioSeqHandler.endElementHandler(AGAVEB 
ioSeqHandler.java:192)
	at  
org.biojava.bio.seq.io.agave.StAXFeatureHandler.endElement(StAXFeatureHa 
ndler.java:807)
	at  
org.biojava.bio.seq.io.agave.SAX2StAXAdaptor.endElement(SAX2StAXAdaptor. 
java:161)
	at  
oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParse 
r.java:1203)
	at  
oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingP 
arser.java:294)
	at  
oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingPars 
er.java:261)
	at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:147)
	at
com.carlisia.ws.bio.XEMBL2Sequence.convert(XEMBL2Sequence.java:78)
	at  
com.carlisia.ws.bio.XEMBL2Sequence.getSequence(XEMBL2Sequence.java:42)
	at  
com.carlisia.ws.bio.TranscribeDNAtoRNA.transcribeDNAtoRNAEMBL(Transcribe 
DNAtoRNA.java:63)
	at  
com.carlisia.ws.bio.TranscribeDNAtoRNA.main(TranscribeDNAtoRNA.java:71)
Debugger disconnected from local process.
ggcggtcggtctcgccttgtcgccagctccattttcctctctttctcttcccctttccttcgcgcccaagag 
cgcctcccagcctcgtagggtggtcacggagcccctgcgccttttccttgctcgggtcctgcgtccgcgcct 
gccccgccatgaatgaggagtacgacgtgatcgtgctgggcaccggcctgacggtgggcgccagggctgagg 
ggccggggctgagcagccggg
Process exited with exit code 0.

_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev


More information about the biojava-dev mailing list