[Biojava-l] Parsing exception reading sequence from GenbankRichSequenceDB

Scott Frees sfrees at ramapo.edu
Fri Feb 17 03:25:03 UTC 2012


George - Thanks for your response.

I think I tracked down the problem.  When building the FetchURL,
GenbankRichSequenceDB uses "genbank" as the db.  In the
org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
specifically not set when given "genbank" - see lines 54-55 commented
out.

		//	rettype = format;
		//	retmode = format;

Entrez recently updated their API
(http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
release notes they say they've set defaults on each database for
retmode.  I'm new to biojava and entrez, but I can only assume that
the "genbank" db used to return sequences as text always, which is why
FetchURL doesn't include the parameter in the URL it builds.  It looks
like the default now is XML - which breaks the GenbankRichSequenceDB
parser.

I proved it out by subclassing GenbankRichSequenceDB to set the
retmode parameter as text, and the problem is resolved.

@Override
protected URL getAddress(String id) throws MalformedURLException {
        FetchURL seqURL = new FetchURL("Genbank", "text");
        String baseurl = seqURL.getbaseURL();
        String db = seqURL.getDB();
        // added retmode=text
        String url =
baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
        return new URL(url);
}

I think a more elegant solution would be to simply fix FetchURL to use
the retmode parameter

Regards -
Scott

On Thu, Feb 16, 2012 at 8:53 PM, George Waldon <gwaldon at geneinfinity.org> wrote:
> Hello Scott,
>
> This appears to be an exception thrown by the parser. Is-there a way you can
> fetch the sequence(s) as a text file before the exception occurs? It would
> be interesting to see if you can reproduce the exception; you can send me
> the file if you want.
>
> Regards,
> George
>
> Quoting Scott Frees <sfrees at ramapo.edu>:
>
>> Hello -
>>
>> I have developed an application that searches and compares
>> g-quadruplexes within mRNA.  The web application has been running
>> without any problems on several different web servers for over a year.
>>  Suddenly, just this week, it is unable to download sequence data
>> using GenbankRichSequenceDB - has anyone else has had this problem?
>>
>> We are using BioJava 1.8.1
>>
>> Below is the exception trace, and the code that follows is a small
>> test app that generates the exception.  This code worked without any
>> problems prior to Tuesday this week, and we haven't made any
>> modification to our application.
>> ------------------------------------------------------
>> org.biojava.bio.BioException: Failed to read Genbank sequence
>>        at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
>>        at Tester.main(Tester.java:11)
>> Caused by: org.biojava.bio.BioException: Could not read sequence
>>        at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
>>        at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
>>        ... 1 more
>> Caused by: org.biojava.bio.seq.io.ParseException:
>>
>> A Exception Has Occurred During Parsing.
>> Please submit the details that follow to biojava-l at biojava.org or post
>> a bug report to http://bugzilla.open-bio.org/
>>
>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>> Accession=null
>> Id=null
>> Comments=Bad section
>> Parse_block=<?xml   version="1.0"?>
>> Stack trace follows ....
>>
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
>>        at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
>>        ... 2 more
>> Caused by: java.lang.StringIndexOutOfBoundsException: String index out
>> of range: -4
>>        at java.lang.String.substring(Unknown Source)
>>        at java.lang.String.substring(Unknown Source)
>>        at
>> org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
>>        ... 4 more
>> -----------------------------
>>
>>
>> import org.biojava.bio.BioException;
>> import org.biojava.bio.seq.db.IllegalIDException;
>> import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
>> import org.biojavax.bio.seq.RichSequence;
>>
>> public class Tester {
>>        public static void main(String args[]) {
>>                String id = "NM_001110.2";  // Issue occurs with any ID
>>                GenbankRichSequenceDB  ncbi = new GenbankRichSequenceDB();
>>                try {
>>                        RichSequence rs = ncbi.getRichSequence(id);
>>                        System.out.println(rs.seqString());
>>                } catch (IllegalIDException e) {
>>                        e.printStackTrace();
>>                } catch (BioException e) {
>>                        e.printStackTrace();
>>                }
>>        }
>> }
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
> --------------------------------
> George Waldon
>
>




More information about the Biojava-l mailing list