[Biojava-l] Parsing exception reading sequence from GenbankRichSequenceDB

George Waldon gwaldon at geneinfinity.org
Sun Feb 19 01:53:24 UTC 2012


Hi Scott,

Sorry for the delay in response. It was a little more complicated than  
expected and I hope the fix does not break any code around. I added a  
test for GenbankRichSequenceDB. Everything should be committed by now  
and you can check out the result on the BioJava server. Let me know if  
you get any problems or unexpected issues.

Thank you,
George

Quoting Scott Frees <sfrees at ramapo.edu>:

> George -
>
> That looks right to me.  I don't have a developer account, so it would
> be great if you could check that in.
>
> Thanks!
> Scott
>
> On Fri, Feb 17, 2012 at 12:56 PM, George Waldon
> <gwaldon at geneinfinity.org> wrote:
>> Hi Scott,
>>
>> Yes, well done. You need to fix rettype too. So, if I have it correct, we
>> should uncomment and have:
>>
>> rettype = "gb"
>> retmode = "txt"
>>
>> and existing code should not be broken. What do you think? I can commit if
>> you do not have a developer account.
>>
>> Thanks,
>> - George
>>
>> Quoting Scott Frees <sfrees at ramapo.edu>:
>>
>>> George - Thanks for your response.
>>>
>>> I think I tracked down the problem. &nbsp;When building the FetchURL,
>>> GenbankRichSequenceDB uses "genbank" as the db. &nbsp;In the
>>> org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
>>> specifically not set when given "genbank" - see lines 54-55 commented
>>> out.
>>>
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// &nbsp;  
>>> &nbsp; &nbsp;rettype = format;
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// &nbsp;  
>>> &nbsp; &nbsp;retmode = format;
>>>
>>> Entrez recently updated their API
>>> (http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
>>> release notes they say they've set defaults on each database for
>>> retmode. &nbsp;I'm new to biojava and entrez, but I can only assume that
>>> the "genbank" db used to return sequences as text always, which is why
>>> FetchURL doesn't include the parameter in the URL it builds. &nbsp;It looks
>>> like the default now is XML - which breaks the GenbankRichSequenceDB
>>> parser.
>>>
>>> I proved it out by subclassing GenbankRichSequenceDB to set the
>>> retmode parameter as text, and the problem is resolved.
>>>
>>> @Override
>>> protected URL getAddress(String id) throws MalformedURLException {
>>> &nbsp; &nbsp; &nbsp; &nbsp;FetchURL seqURL = new  
>>> FetchURL("Genbank", "text");
>>> &nbsp; &nbsp; &nbsp; &nbsp;String baseurl = seqURL.getbaseURL();
>>> &nbsp; &nbsp; &nbsp; &nbsp;String db = seqURL.getDB();
>>> &nbsp; &nbsp; &nbsp; &nbsp;// added retmode=text
>>> &nbsp; &nbsp; &nbsp; &nbsp;String url =
>>>
>>> baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
>>> &nbsp; &nbsp; &nbsp; &nbsp;return new URL(url);
>>> }
>>>
>>> I think a more elegant solution would be to simply fix FetchURL to use
>>> the retmode parameter
>>>
>>> Regards -
>>> Scott
>>>
>>> On Thu, Feb 16, 2012 at 8:53 PM, George Waldon <gwaldon at geneinfinity.org>
>>> wrote:
>>>>
>>>> Hello Scott,
>>>>
>>>> This appears to be an exception thrown by the parser. Is-there a way you
>>>> can
>>>> fetch the sequence(s) as a text file before the exception occurs? It
>>>> would
>>>> be interesting to see if you can reproduce the exception; you can send me
>>>> the file if you want.
>>>>
>>>> Regards,
>>>> George
>>>>
>>>> Quoting Scott Frees <sfrees at ramapo.edu>:
>>>>
>>>>> Hello -
>>>>>
>>>>> I have developed an application that searches and compares
>>>>> g-quadruplexes within mRNA. &nbsp;The web application has been running
>>>>>
>>>>> without any problems on several different web servers for over a year.
>>>>> &nbsp;Suddenly, just this week, it is unable to download sequence data
>>>>>
>>>>> using GenbankRichSequenceDB - has anyone else has had this problem?
>>>>>
>>>>> We are using BioJava 1.8.1
>>>>>
>>>>> Below is the exception trace, and the code that follows is a small
>>>>> test app that generates the exception. &nbsp;This code worked without
>>>>> any
>>>>>
>>>>> problems prior to Tuesday this week, and we haven't made any
>>>>> modification to our application.
>>>>> ------------------------------------------------------
>>>>> org.biojava.bio.BioException: Failed to read Genbank sequence
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at Tester.main(Tester.java:11)
>>>>>
>>>>> Caused by: org.biojava.bio.BioException: Could not read sequence
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 1 more
>>>>>
>>>>> Caused by: org.biojava.bio.seq.io.ParseException:
>>>>>
>>>>> A Exception Has Occurred During Parsing.
>>>>> Please submit the details that follow to biojava-l at biojava.org or post
>>>>> a bug report to http://bugzilla.open-bio.org/
>>>>>
>>>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>>>> Accession=null
>>>>> Id=null
>>>>> Comments=Bad section
>>>>> Parse_block=<?xml &nbsp; version="1.0"?>
>>>>> Stack trace follows ....
>>>>>
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 2 more
>>>>>
>>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>>>> of range: -4
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>>
>>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 4 more
>>>>>
>>>>> -----------------------------
>>>>>
>>>>>
>>>>> import org.biojava.bio.BioException;
>>>>> import org.biojava.bio.seq.db.IllegalIDException;
>>>>> import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
>>>>> import org.biojavax.bio.seq.RichSequence;
>>>>>
>>>>> public class Tester {
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;public static void main(String args[]) {
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;String id =
>>>>> "NM_001110.2"; &nbsp;// Issue occurs with any ID
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>>> &nbsp;GenbankRichSequenceDB &nbsp;ncbi = new GenbankRichSequenceDB();
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;try {
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>>> &nbsp; &nbsp;RichSequence rs = ncbi.getRichSequence(id);
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>>> &nbsp; &nbsp;System.out.println(rs.seqString());
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
>>>>> (IllegalIDException e) {
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>>> &nbsp; &nbsp;e.printStackTrace();
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
>>>>> (BioException e) {
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>>> &nbsp; &nbsp;e.printStackTrace();
>>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}
>>>>> &nbsp; &nbsp; &nbsp; &nbsp;}
>>>>> }
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list &nbsp;- &nbsp;Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------
>>>> George Waldon
>>>>
>>>>
>>>
>>
>>
>>
>> --------------------------------
>> George Waldon
>>
>>
>



--------------------------------
George Waldon





More information about the Biojava-l mailing list