[Biojava-l] Parsing Genbank-sequences from NCBI

Seth Johnson johnson.biotech at gmail.com
Mon Aug 14 14:47:26 UTC 2006


Hi Richard,

Apparently there are more problems.  I get an exception while trying to
retrieve BM353894.1
--------------------------------------------------------------
Trying to get: BM353894.1
org.biojava.bio.BioException: Failed to read Genbank sequence
        at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
GenbankRichSequenceDB.java:157)
        at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250)
        at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197)
        at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java
:105)
        at exonhit.parsers.EventParser.main(EventParser.java:312)
Caused by: org.biojava.bio.BioException: Could not read sequence
        at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
RichStreamReader.java:112)
        at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
GenbankRichSequenceDB.java:153)
        ... 4 more
Caused by: org.biojava.bio.seq.io.ParseException
        at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(
GenbankFormat.java:274)
        at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
RichStreamReader.java:109)
        ... 5 more
Java Result: -1
-------------------------------------------------------------

On 8/14/06, Richard Holland <holland at ebi.ac.uk> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I've made a small change to the regex which matches these so that it
> will now accept spaces before the colon (previously, it didn't).
>
> Can you check out the latest from CVS and try again?
>
> cheers,
> Richard
>
> Seth Johnson wrote:
> > More problems with parsing nucleotide sequences from NCBI.  Apparently,
> > there's an odd dbxref tag on some of the sequences submitted by ATCC
> that
> > causes an exception.  I've ran into 2 so far, but I'm sure there are
> more:
> >
> > AA343569.1
> > AA325485.1
> >
> > Exceptions produced are as follows:
> > --------------------------------------------------------------
> > Trying to get: AA343569.1
> > org.biojava.bio.BioException: Failed to read Genbank sequence
> >         at
> > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
> GenbankRichSequenceDB.java:157)
> >         at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java
> :250)
> >         at exonhit.parsers.EventParser.insertRglrSE(EventParser.java
> :197)
> >         at
> > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105)
> >         at exonhit.parsers.EventParser.main(EventParser.java:310)
> > Caused by: org.biojava.bio.BioException: Could not read sequence
> >         at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
> RichStreamReader.java:112)
> >         at
> > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
> GenbankRichSequenceDB.java:153)
> >         ... 4 more
> > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC
> > (inhost):145151, accession:AA343569
> >         at
> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(
> GenbankFormat.java:438)
> >         at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
> RichStreamReader.java:109)
> >         ... 5 more
> > Java Result: -1
> > =========================================================
> > Trying to get: AA325485.1
> > org.biojava.bio.BioException: Failed to read Genbank sequence
> >         at
> > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
> GenbankRichSequenceDB.java:157)
> >         at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java
> :250)
> >         at exonhit.parsers.EventParser.insertRglrSE(EventParser.java
> :197)
> >         at
> > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105)
> >         at exonhit.parsers.EventParser.main(EventParser.java:312)
> > Caused by: org.biojava.bio.BioException: Could not read sequence
> >         at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
> RichStreamReader.java:112)
> >         at
> > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(
> GenbankRichSequenceDB.java:153)
> >         ... 4 more
> > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC
> > (inhost):125990, accession:AA325485
> >         at
> > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(
> GenbankFormat.java:438)
> >         at
> > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(
> RichStreamReader.java:109)
> >         ... 5 more
> > Java Result: -1
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFE4DV+4C5LeMEKA/QRAtrTAKCjNFnkmhAF52LhvrpyurnRToe0LACgiEUs
> GUmVcpkdByVWADCXvfKCsYE=
> =ZBlJ
> -----END PGP SIGNATURE-----
>



-- 
Best Regards,


Seth Johnson
Senior Bioinformatics Associate

Ph: (202) 470-0900
Fx: (775) 251-0358



More information about the Biojava-l mailing list