[Biojava-l] Parsing exception reading sequence from GenbankRichSequenceDB

Scott Frees sfrees at ramapo.edu
Fri Feb 17 18:06:11 UTC 2012


George -

That looks right to me.  I don't have a developer account, so it would
be great if you could check that in.

Thanks!
Scott

On Fri, Feb 17, 2012 at 12:56 PM, George Waldon
<gwaldon at geneinfinity.org> wrote:
> Hi Scott,
>
> Yes, well done. You need to fix rettype too. So, if I have it correct, we
> should uncomment and have:
>
> rettype = "gb"
> retmode = "txt"
>
> and existing code should not be broken. What do you think? I can commit if
> you do not have a developer account.
>
> Thanks,
> - George
>
> Quoting Scott Frees <sfrees at ramapo.edu>:
>
>> George - Thanks for your response.
>>
>> I think I tracked down the problem.  When building the FetchURL,
>> GenbankRichSequenceDB uses "genbank" as the db.  In the
>> org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
>> specifically not set when given "genbank" - see lines 54-55 commented
>> out.
>>
>>                //      rettype = format;
>>                //      retmode = format;
>>
>> Entrez recently updated their API
>> (http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
>> release notes they say they've set defaults on each database for
>> retmode.  I'm new to biojava and entrez, but I can only assume that
>> the "genbank" db used to return sequences as text always, which is why
>> FetchURL doesn't include the parameter in the URL it builds.  It looks
>> like the default now is XML - which breaks the GenbankRichSequenceDB
>> parser.
>>
>> I proved it out by subclassing GenbankRichSequenceDB to set the
>> retmode parameter as text, and the problem is resolved.
>>
>> @Override
>> protected URL getAddress(String id) throws MalformedURLException {
>>        FetchURL seqURL = new FetchURL("Genbank", "text");
>>        String baseurl = seqURL.getbaseURL();
>>        String db = seqURL.getDB();
>>        // added retmode=text
>>        String url =
>>
>> baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
>>        return new URL(url);
>> }
>>
>> I think a more elegant solution would be to simply fix FetchURL to use
>> the retmode parameter
>>
>> Regards -
>> Scott
>>
>> On Thu, Feb 16, 2012 at 8:53 PM, George Waldon <gwaldon at geneinfinity.org>
>> wrote:
>>>
>>> Hello Scott,
>>>
>>> This appears to be an exception thrown by the parser. Is-there a way you
>>> can
>>> fetch the sequence(s) as a text file before the exception occurs? It
>>> would
>>> be interesting to see if you can reproduce the exception; you can send me
>>> the file if you want.
>>>
>>> Regards,
>>> George
>>>
>>> Quoting Scott Frees <sfrees at ramapo.edu>:
>>>
>>>> Hello -
>>>>
>>>> I have developed an application that searches and compares
>>>> g-quadruplexes within mRNA. &nbsp;The web application has been running
>>>>
>>>> without any problems on several different web servers for over a year.
>>>> &nbsp;Suddenly, just this week, it is unable to download sequence data
>>>>
>>>> using GenbankRichSequenceDB - has anyone else has had this problem?
>>>>
>>>> We are using BioJava 1.8.1
>>>>
>>>> Below is the exception trace, and the code that follows is a small
>>>> test app that generates the exception. &nbsp;This code worked without
>>>> any
>>>>
>>>> problems prior to Tuesday this week, and we haven't made any
>>>> modification to our application.
>>>> ------------------------------------------------------
>>>> org.biojava.bio.BioException: Failed to read Genbank sequence
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at Tester.main(Tester.java:11)
>>>>
>>>> Caused by: org.biojava.bio.BioException: Could not read sequence
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 1 more
>>>>
>>>> Caused by: org.biojava.bio.seq.io.ParseException:
>>>>
>>>> A Exception Has Occurred During Parsing.
>>>> Please submit the details that follow to biojava-l at biojava.org or post
>>>> a bug report to http://bugzilla.open-bio.org/
>>>>
>>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>>> Accession=null
>>>> Id=null
>>>> Comments=Bad section
>>>> Parse_block=<?xml &nbsp; version="1.0"?>
>>>> Stack trace follows ....
>>>>
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 2 more
>>>>
>>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>>> of range: -4
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>>>
>>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
>>>> &nbsp; &nbsp; &nbsp; &nbsp;... 4 more
>>>>
>>>> -----------------------------
>>>>
>>>>
>>>> import org.biojava.bio.BioException;
>>>> import org.biojava.bio.seq.db.IllegalIDException;
>>>> import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
>>>> import org.biojavax.bio.seq.RichSequence;
>>>>
>>>> public class Tester {
>>>> &nbsp; &nbsp; &nbsp; &nbsp;public static void main(String args[]) {
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;String id =
>>>> "NM_001110.2"; &nbsp;// Issue occurs with any ID
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>> &nbsp;GenbankRichSequenceDB &nbsp;ncbi = new GenbankRichSequenceDB();
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;try {
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>> &nbsp; &nbsp;RichSequence rs = ncbi.getRichSequence(id);
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>> &nbsp; &nbsp;System.out.println(rs.seqString());
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
>>>> (IllegalIDException e) {
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>> &nbsp; &nbsp;e.printStackTrace();
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
>>>> (BioException e) {
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
>>>> &nbsp; &nbsp;e.printStackTrace();
>>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}
>>>> &nbsp; &nbsp; &nbsp; &nbsp;}
>>>> }
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list &nbsp;- &nbsp;Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>>>
>>> --------------------------------
>>> George Waldon
>>>
>>>
>>
>
>
>
> --------------------------------
> George Waldon
>
>




More information about the Biojava-l mailing list