[Biojava-l] A Exception Has Occurred During Parsing. (Ilhami Visne)

Richard Holland holland at eaglegenomics.com
Mon Dec 14 23:18:51 UTC 2009


Thanks for noticing this - so the problem was not random then, but was predictable based on specific sequences! I have patched the current version of GenbankFormat in subversion trunk to behave better with blank lines in comment sections. Can you independently test it to see if it works for this sequence now?


On 13 Dec 2009, at 22:11, Deepak Sheoran wrote:

> Hi,
> I am attaching a quick fix to solve you problem with this record. Please have look on the attached .jpeg file to see the solution.
> 
> Thanks
> Deepak Sheoran
> 
>> Hi,
>> 
>> I'm following the suggestion "Please submit the details that follow to
>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/"
>> .
>> 
>> The sequence of concerns is at
>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1
>> 
>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>> Accession=null
>> Id=null
>> Comments=Bad section
>> Parse_block=
>> Stack trace follows ....
>> 
>> 
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603)
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278)
>>        at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>        ... 8 more
>> Caused by: java.lang.NullPointerException
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593)
>>        ... 10 more
>> org.biojava.bio.BioException: IO failure whilst reading from Genbank
>> 
>> Any quick fix,patch?
>> 
>> thanks.
>> Ilhami Visne
>> 
>>  
>> ------------------------------------------------------------------------
>> 
>> Subject:
>> Re: [Biojava-l] A Exception Has Occurred During Parsing.
>> From:
>> Richard Holland <holland at eaglegenomics.com>
>> Date:
>> Fri, 11 Dec 2009 09:59:34 +0000
>> To:
>> Ilhami Visne <ilhami.visne at gmail.com>
>> 
>> To:
>> Ilhami Visne <ilhami.visne at gmail.com>
>> CC:
>> biojava-l at lists.open-bio.org
>> 
>> 
>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens?
>> 
>> cheers,
>> Richard
>> 
>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote:
>> 
>>  
>>> Hi,
>>> 
>>> I'm following the suggestion "Please submit the details that follow to
>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/"
>>> .
>>> 
>>> The sequence of concerns is at
>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1
>>> 
>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>> Accession=null
>>> Id=null
>>> Comments=Bad section
>>> Parse_block=
>>> Stack trace follows ....
>>> 
>>> 
>>>       at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603)
>>>       at
>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278)
>>>       at
>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>>       ... 8 more
>>> Caused by: java.lang.NullPointerException
>>>       at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593)
>>>       ... 10 more
>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank
>>> 
>>> Any quick fix,patch?
>>> 
>>> thanks.
>>> Ilhami Visne
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>    
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
>> 
>> 
>>  
>> ------------------------------------------------------------------------
>> 
>> Subject:
>> Re: [Biojava-l] Sequences as strings to RichSequence iterator
>> From:
>> Oliver Stolpe <oliver.stolpe at fu-berlin.de>
>> Date:
>> Fri, 11 Dec 2009 11:44:26 +0100
>> To:
>> Richard Holland <holland at eaglegenomics.com>
>> 
>> To:
>> Richard Holland <holland at eaglegenomics.com>
>> CC:
>> biojava-l at biojava.org
>> 
>> 
>> Dear Richard,
>> 
>> thank you for your answer. What you stated in your PSS is exactly what I want. It worked well with the ByteArrayOutputStream. Nevertheless I dont use it but write my own string by concatenating the names and sequences using String s = ">" + name + "\r\n" + sequence + "\r\n". I know its not the best way but this fasta format thing modified the input names too much that I would have had to much trouble getting the right information out of them.
>> 
>> Best regards,
>> Oliver
>>         Richard Holland schrieb:
>>> PS. Spot the deliberate mistake in the hasNext() function... that should be <, not <=!
>>> 
>>> PPS. In your original email you stated you wanted to read your sequences as Fasta. In Biojava, all sequences are RichSequences - they have no format other than the object model of RichSequence itself. Fasta only gets involved when you're reading from a Fasta file, or writing to one. If you need to show the sequences as Fasta in your user interface, you should consider using the FastaWriter writeSequence() method with the PrintStream parameter and wiring in a StringWriter to the PrintStream so that you can get a String representation of a Fasta record.
>>> 
>>> On 6 Dec 2009, at 15:41, Richard Holland wrote:
>>> 
>>> 
>>>> I'm not sure what you're trying to do here - are you trying to represent your string array of sequences as a RichSequenceIterator, or are you trying to convert them into FASTA? I'll answer both anyway...:
>>>> 
>>>> To convert your String[] of sequences into a RichSequenceIterator you need to create a new class that implements the RichSequenceIterator interface. You would probably write something like this (which I have not checked or compiled - so if it has bugs, sorry!):
>>>> 
>>>>  public class MyDNASeqIterator implements RichSequenceIterator {
>>>>     private final String[] sequences;
>>>>     private final int counter;
>>>> 
>>>>     public MyDNASeqIterator(String[] sequences) { this.sequences = sequences; this.counter = 0; }
>>>> 
>>>>     public hasNext() {          return this.counter <= this.sequences.length;
>>>>     }
>>>> 
>>>>     public Sequence nextSequence() { return nextRichSequence(); }
>>>> 
>>>>     public BioEntry nextBioEntry() { return nextRichSequence(); }
>>>> 
>>>>     public RichSequence nextRichSequence() {
>>>>        String seqName = "MySeq"+this.counter;
>>>>        return RichSequence.Tools.createRichSequence(seqName, this.sequences[this.counter++], DNATools.getDNA());
>>>>     }
>>>>  }
>>>> 
>>>> You can then instantiate an object using MyDNASeqIterator's constructor to give it your string array, and iterate over it to get corresponding RichSequence instances.
>>>> 
>>>> To convert your sequences to FASTA, use the above iterator to generate sequences to pass to FastaFormat in the same way that you would write a normal FASTA file.
>>>> 
>>>> cheers,
>>>> Richard
>>>> 
>>>> On 6 Dec 2009, at 14:04, Oliver Stolpe wrote:
>>>> 
>>>>   
>>>>> Hello *,
>>>>> 
>>>>> I have a set of sequences as strings in an array. I now want to turn them into an iterator over RichSequences in fasta-format. I read in the cookbook, but I dont get it. And looked up the examples in biojavax-doc. I tried much but I have no good starting point. No starting point at all. How do the RichSequenceBuilder work? What about the FastaFormat-thing?
>>>>> I thought about putting the sequences in a fast-file and then read the file. But there must be a much more straight-forward way!
>>>>> 
>>>>> Thanks in advance for any hints,
>>>>> Oliver
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>      
>>>> -- 
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>    
>>> 
>>> -- 
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> 
>>>  
>> 
>> 
>> Mit den besten Grüßen,
>> Oliver
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> Subject:
>> Re: [Biojava-l] A Exception Has Occurred During Parsing.
>> From:
>> ilhami visne <ilhami.visne at gmail.com>
>> Date:
>> Fri, 11 Dec 2009 12:36:46 +0100
>> To:
>> Richard Holland <holland at eaglegenomics.com>
>> 
>> To:
>> Richard Holland <holland at eaglegenomics.com>
>> CC:
>> biojava-l at lists.open-bio.org
>> 
>> 
>> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB  and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop).
>> 
>> public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{
>> 
>>    private String seq_start;
>>    private String seq_stop;
>>    private String strand="1";//1=plus, 2=minus
>> 
>>    public NCBIGenbankSequenceFetcher() {
>>    }
>> 
>>    public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) {
>>        this.seq_start = seq_start;
>>        this.seq_stop = seq_stop;
>>    }
>> 
>>    public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) {
>>        this.seq_start = seq_start;
>>        this.seq_stop = seq_stop;
>>        this.strand = strand;
>>    }
>> 
>>    @Override
>>    protected URL getAddress(String id) throws MalformedURLException {
>>        FetchURL seqURL = new FetchURL("Genbank", "text");
>>        String baseurl = seqURL.getbaseURL();
>>        String db = seqURL.getDB();
>>        String url = baseurl+db+"&id="+id+"&rettype=gb";
>>        if(seq_start != null && seq_stop != null){
>>            url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand;
>>        }
>>        return new URL(url);
>>    }
>> }
>> 
>> From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar)
>> 
>> for(String gi:ids){ // ids is a list<string>
>>    seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi);
>> }
>> 
>> What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming  from server.
>> 
>> ilhami visne.
>> 
>> On 12/11/2009 10:59 AM, Richard Holland wrote:
>>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens?
>>> 
>>> cheers,
>>> Richard
>>> 
>>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote:
>>> 
>>>  
>>>> Hi,
>>>> 
>>>> I'm following the suggestion "Please submit the details that follow to
>>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/"
>>>> .
>>>> 
>>>> The sequence of concerns is at
>>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1 
>>>> 
>>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>>> Accession=null
>>>> Id=null
>>>> Comments=Bad section
>>>> Parse_block=
>>>> Stack trace follows ....
>>>> 
>>>> 
>>>>        at
>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603) 
>>>>        at
>>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278) 
>>>>        at
>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) 
>>>>        ... 8 more
>>>> Caused by: java.lang.NullPointerException
>>>>        at
>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593) 
>>>>        ... 10 more
>>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank
>>>> 
>>>> Any quick fix,patch?
>>>> 
>>>> thanks.
>>>> Ilhami Visne
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>     
>>> -- 
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> 
>>> 
>>>   
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> Subject:
>> Re: [Biojava-l] A Exception Has Occurred During Parsing.
>> From:
>> Richard Holland <holland at eaglegenomics.com>
>> Date:
>> Fri, 11 Dec 2009 14:17:21 +0000
>> To:
>> ilhami visne <ilhami.visne at gmail.com>
>> 
>> To:
>> ilhami visne <ilhami.visne at gmail.com>
>> CC:
>> biojava-l at lists.open-bio.org
>> 
>> 
>> If the problem is random, it's almost certainly due to problems with the NCBI server feeding you data. There are restrictions on usage - e.g. NCBI only allows a certain number of requests - so you might be running into those.
>> 
>> cheers,
>> Richard
>> 
>> On 11 Dec 2009, at 11:36, ilhami visne wrote:
>> 
>>  
>>> I created a new class NCBIGenbankSequenceFetcher.java which extends GenbankRichSequenceDB  and overrid the "getAddress(String id)" to limit the sequence for an id (seq_start and seq_stop).
>>> 
>>> public class NCBIGenbankSequenceFetcher extends GenbankRichSequenceDB{
>>> 
>>>    private String seq_start;
>>>    private String seq_stop;
>>>    private String strand="1";//1=plus, 2=minus
>>> 
>>>    public NCBIGenbankSequenceFetcher() {
>>>    }
>>> 
>>>    public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop) {
>>>        this.seq_start = seq_start;
>>>        this.seq_stop = seq_stop;
>>>    }
>>> 
>>>    public NCBIGenbankSequenceFetcher(String seq_start, String seq_stop,String strand) {
>>>        this.seq_start = seq_start;
>>>        this.seq_stop = seq_stop;
>>>        this.strand = strand;
>>>    }
>>> 
>>>    @Override
>>>    protected URL getAddress(String id) throws MalformedURLException {
>>>        FetchURL seqURL = new FetchURL("Genbank", "text");
>>>        String baseurl = seqURL.getbaseURL();
>>>        String db = seqURL.getDB();
>>>        String url = baseurl+db+"&id="+id+"&rettype=gb";
>>>        if(seq_start != null && seq_stop != null){
>>>            url +="&seq_start="+seq_start+"&seq_stop="+seq_stop+"&strand="+strand;
>>>        }
>>>        return new URL(url);
>>>    }
>>> }
>>> 
>>> From an other class, i create an instance of this class and then call its "getRichSequence(id)" method. (Not the same, but similar)
>>> 
>>> for(String gi:ids){ // ids is a list<string>
>>>    seq = new NCBIGenbankSequenceFetcher(seq_start, seq_stop,strand).getRichSequence(gi);
>>> }
>>> 
>>> What i found later is that it randomly throws the exception, not by any particular sequence. So my guess an io error, which arises during the data streaming  from server.
>>> 
>>> ilhami visne.
>>> 
>>> On 12/11/2009 10:59 AM, Richard Holland wrote:
>>>    
>>>> Hello. Could you also post the relevant parts of your code that you are running when this exception happens?
>>>> 
>>>> cheers,
>>>> Richard
>>>> 
>>>> On 11 Dec 2009, at 00:46, Ilhami Visne wrote:
>>>> 
>>>>  
>>>>      
>>>>> Hi,
>>>>> 
>>>>> I'm following the suggestion "Please submit the details that follow to
>>>>> 
>>>>> biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/
>>>>> "
>>>>> .
>>>>> 
>>>>> The sequence of concerns is at
>>>>> 
>>>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=39645757&rettype=gb&seq_start=1353&seq_stop=2128&strand=1
>>>>> 
>>>>> 
>>>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>>>> Accession=null
>>>>> Id=null
>>>>> Comments=Bad section
>>>>> Parse_block=
>>>>> Stack trace follows ....
>>>>> 
>>>>> 
>>>>>       at
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:603)
>>>>>       at
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:278)
>>>>>       at
>>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>>>>       ... 8 more
>>>>> Caused by: java.lang.NullPointerException
>>>>>       at
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:593)
>>>>>       ... 10 more
>>>>> org.biojava.bio.BioException: IO failure whilst reading from Genbank
>>>>> 
>>>>> Any quick fix,patch?
>>>>> 
>>>>> thanks.
>>>>> Ilhami Visne
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> 
>>>>>    
>>>>>        
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>> 
>>>> 
>>>> 
>>>>  
>>>>      
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
>> 
>> 
>>  ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>  
> 
> <LocationOfError.jpeg>_______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/





More information about the Biojava-l mailing list