[Biojava-l] Parsing exception reading sequence from GenbankRichSequenceDB

George Waldon gwaldon at geneinfinity.org
Fri Feb 17 17:56:06 UTC 2012


Hi Scott,

Yes, well done. You need to fix rettype too. So, if I have it correct,  
we should uncomment and have:

rettype = "gb"
retmode = "txt"

and existing code should not be broken. What do you think? I can  
commit if you do not have a developer account.

Thanks,
- George

Quoting Scott Frees <sfrees at ramapo.edu>:

> George - Thanks for your response.
>
> I think I tracked down the problem.  When building the FetchURL,
> GenbankRichSequenceDB uses "genbank" as the db.  In the
> org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
> specifically not set when given "genbank" - see lines 54-55 commented
> out.
>
> 		//	rettype = format;
> 		//	retmode = format;
>
> Entrez recently updated their API
> (http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
> release notes they say they've set defaults on each database for
> retmode.  I'm new to biojava and entrez, but I can only assume that
> the "genbank" db used to return sequences as text always, which is why
> FetchURL doesn't include the parameter in the URL it builds.  It looks
> like the default now is XML - which breaks the GenbankRichSequenceDB
> parser.
>
> I proved it out by subclassing GenbankRichSequenceDB to set the
> retmode parameter as text, and the problem is resolved.
>
> @Override
> protected URL getAddress(String id) throws MalformedURLException {
>         FetchURL seqURL = new FetchURL("Genbank", "text");
>         String baseurl = seqURL.getbaseURL();
>         String db = seqURL.getDB();
>         // added retmode=text
>         String url =
> baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
>         return new URL(url);
> }
>
> I think a more elegant solution would be to simply fix FetchURL to use
> the retmode parameter
>
> Regards -
> Scott
>
> On Thu, Feb 16, 2012 at 8:53 PM, George Waldon  
> <gwaldon at geneinfinity.org> wrote:
>> Hello Scott,
>>
>> This appears to be an exception thrown by the parser. Is-there a way you can
>> fetch the sequence(s) as a text file before the exception occurs? It would
>> be interesting to see if you can reproduce the exception; you can send me
>> the file if you want.
>>
>> Regards,
>> George
>>
>> Quoting Scott Frees <sfrees at ramapo.edu>:
>>
>>> Hello -
>>>
>>> I have developed an application that searches and compares
>>> g-quadruplexes within mRNA. &nbsp;The web application has been running
>>> without any problems on several different web servers for over a year.
>>> &nbsp;Suddenly, just this week, it is unable to download sequence data
>>> using GenbankRichSequenceDB - has anyone else has had this problem?
>>>
>>> We are using BioJava 1.8.1
>>>
>>> Below is the exception trace, and the code that follows is a small
>>> test app that generates the exception. &nbsp;This code worked without any
>>> problems prior to Tuesday this week, and we haven't made any
>>> modification to our application.
>>> ------------------------------------------------------
>>> org.biojava.bio.BioException: Failed to read Genbank sequence
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at Tester.main(Tester.java:11)
>>> Caused by: org.biojava.bio.BioException: Could not read sequence
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
>>> &nbsp; &nbsp; &nbsp; &nbsp;... 1 more
>>> Caused by: org.biojava.bio.seq.io.ParseException:
>>>
>>> A Exception Has Occurred During Parsing.
>>> Please submit the details that follow to biojava-l at biojava.org or post
>>> a bug report to http://bugzilla.open-bio.org/
>>>
>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>> Accession=null
>>> Id=null
>>> Comments=Bad section
>>> Parse_block=<?xml &nbsp; version="1.0"?>
>>> Stack trace follows ....
>>>
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>> &nbsp; &nbsp; &nbsp; &nbsp;... 2 more
>>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>>> of range: -4
>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
>>> &nbsp; &nbsp; &nbsp; &nbsp;at
>>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
>>> &nbsp; &nbsp; &nbsp; &nbsp;... 4 more
>>> -----------------------------
>>>
>>>
>>> import org.biojava.bio.BioException;
>>> import org.biojava.bio.seq.db.IllegalIDException;
>>> import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
>>> import org.biojavax.bio.seq.RichSequence;
>>>
>>> public class Tester {
>>> &nbsp; &nbsp; &nbsp; &nbsp;public static void main(String args[]) {
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;String id =  
>>> "NM_001110.2"; &nbsp;// Issue occurs with any ID
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  
>>> &nbsp;GenbankRichSequenceDB &nbsp;ncbi = new  
>>> GenbankRichSequenceDB();
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;try {
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  
>>> &nbsp; &nbsp; &nbsp;RichSequence rs = ncbi.getRichSequence(id);
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  
>>> &nbsp; &nbsp; &nbsp;System.out.println(rs.seqString());
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch  
>>> (IllegalIDException e) {
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  
>>> &nbsp; &nbsp; &nbsp;e.printStackTrace();
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch  
>>> (BioException e) {
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  
>>> &nbsp; &nbsp; &nbsp;e.printStackTrace();
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}
>>> &nbsp; &nbsp; &nbsp; &nbsp;}
>>> }
>>>
>>> _______________________________________________
>>> Biojava-l mailing list &nbsp;- &nbsp;Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>> --------------------------------
>> George Waldon
>>
>>
>



--------------------------------
George Waldon





More information about the Biojava-l mailing list