[Biojava-l] Parsing Genbank-sequences from NCBI
Felix Dreher
dreher at mpiib-berlin.mpg.de
Fri Jul 14 16:45:54 UTC 2006
Hello,
as I found out in the mean-time, the download from NCBI was not the
original problem. By changing the IDE (from Studio Creator to Netbeans),
I got a more informative error message... :-)
It seems to have to do with specific Genbank sequence characteristics
and trying to change the feature set of the sequences. So the problem is
not really critical, but anyway maybe someone knows what's going wrong?!
The code I use is shown below. What I do is create a GI-number list with
efetch (outside of Java), download the respective sequences from NCBI,
filter them by CDS, and store them in a local BioSQL-database. So all
features and annotations that have nothing to do with the CDS get discarded.
After the successful filtering of one sequence, the following exception
was thrown (in addition to the stack-trace in my last post). This was
the case for quite a number of GI-numbers (unfortunately right now I
can't tell exactly which numbers, because our in-house server is down
for backup-purposes and I can't verify the GI-numbers. But I could do
that next week).
When the filtering was commented out, the download of sequences was
functioning without errors.
Regards,
Felix
java.sql.BatchUpdateException: *Batch entry 0 update reference set
title=Generation and initial analysis of more than 15,000 full-length
human and mouse cDNA sequences, *authors=Strausberg,R.L., Feingold,E.A.,
Grouse,L.H., Derge,J.G., Klausner,R.D., Collins,F.S., Wagner,L.,
Shenmen,C.M., Schuler,G.D., Altschul,S.F., Zeeberg,B., Buetow,K.H.,
Schaefer,C.F., Bhat,N.K., Hopkins,R.F., Jordan,H., Moore,T., Max,S.I.,
Wang,J., Hsieh,F., Diatchenko,L., Marusina,K., Farmer,A.A., Rubin,G.M.,
Hong,L., Stapleton,M., Soares,M.B., Bonaldo,M.F., Casavant,T.L.,
Scheetz,T.E., Brownstein,M.J., Usdin,T.B., Toshiyuki,S., Carninci,P.,
Prange,C., Raha,S.S., Loquellano,N.A., Peters,G.J., Abramson,R.D.,
Mullahy,S.J., Bosak,S.A., McEwan,P.J., McKernan,K.J., Malek,J.A.,
Gunaratne,P.H., Richards,S., Worley,K.C., Hale,S., Garcia,A.M.,
Gay,L.J., Hulyk,S.W., Villalon,D.K., Muzny,D.M., Sodergren,E.J., Lu,X.,
Gibbs,R.A., Fahey,J., Helton,E., Ketteman,M., Madan,A., Rodrigues,S.,
Sanchez,A., Whiting,M., Madan,A., Young,A.C., Shevchenko,Y.,
Bouffard,G.G., Blakesley,R.W., Touchman,J.W., Green,E.D., Dickson,M.C.,
Rodriguez,A.C., Grimwood,J., Schmutz,J., Myers,R.M., Butterfield,Y.S.,
Krzywinski,M.I., Skalska,U., Smailus,D.E., Schnerch,A., Schein,J.E.,
Jones,S.J., Marra,M.A. and Mammalian Gene Collection Program Team
(consortium), location=Proc. Natl. Acad. Sci. U.S.A. 99 (26),
16899-16903 (2002), crc=ffffffffc6201355fae78655, dbxref_id=11 where
reference_id=61 was aborted. Call getNextException to see the cause.
at
org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2497)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1298)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:347)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2559)
at
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:58)
at
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:195)
... 18 more
Source code:
static private int[] numbers = {109732075, 109732055, 109731929,
109731809, 109731807, 109731805}; //and so on
static private GenbankRichSequenceDB gbDb= new GenbankRichSequenceDB();
static private FeatureFilter ffCDS = new FeatureFilter.ByType("CDS");
public static void main(String[] args) {
for(int i=0; i<numbers.length; i++){
RichSequence seq1 = gbDb.getRichSequence(""+numbers[i]);
RichSequence seq2 = getFilteredSequence(seq1);
System.out.println("Old sequence: "+seq1.getName()+", #
of features: "+seq1.getFeatureSet().size());
System.out.println("New sequence: "+seq2.getName()+", #
of features: "+seq2.getFeatureSet().size());
//store seq2 in local BioSQL-DB
}
}
public static RichSequence getFilteredSequence(RichSequence seq) throws
ChangeVetoException, InvalidTermException, BioException {
RichSequence newSeq = RichSequence.Tools.createRichSequence(
RNAiDBFactory.getTargetDBNamespace(),
seq.getName(),
seq.getInternalSymbolList());
//FeatureHolder fh1 only holds one feature: CDS
FeatureHolder fh1 = seq.filter(ffCDS);
for(Iterator i = fh1.features(); i.hasNext();){
RichFeature f = (RichFeature)i.next();
newSeq.createFeature(f.makeTemplate()); // make a new
feature on the new sequence
}
return newSeq;
}
Richard Holland wrote:
> This exception happens whenever the Genbank record has a reference that
> either does not have any author or consortium tags, or has no location.
>
> Are you sure you're using the latest version from CVS? The code I've got
> here works just fine and it's the same as what's in CVS.
>
> cheers,
> Richard
>
> On Thu, 2006-07-13 at 20:46 +0200, Felix Dreher wrote:
>
>> Hello,
>>
>> I have a problem with the parsing of Genbank-Sequences from NCBI.
>>
>> The probably most important line of the log (see below) is the following:
>> Error while trying to call new class org.biojavax.SimpleDocRef(class
>> java.util.ArrayList,class java.lang.String,class java.lang.String)
>>
>> This exception is thrown when I run the following code (with the latest
>> CVS version):
>>
>> GenbankRichSequenceDB ncbi = new GenbankRichSequenceDB();
>> ncbi.setNamespace(RNAiDBFactory.getTargetDBNamespace());
>> RichSequence rs = ncbi.getRichSequence("110002612");
>>
>>
>> If I use the CVS version of March 2006, a different exception is thrown.
>> This is said to be fixed
>> (Re: [Biojava-l] Parsing Genbank/EMBL/XML Sequences from binary NCBI
>> ASN.1 daily update files
>> Richard Holland
>> Fri, 02 Jun 2006 02:16:07 -0700)
>>
>>
>>
>> Any help would be highly appreciated!
>> Best regards,
>> Felix
>>
>>
>>
>> current exception:
>>
>> 2006-07-13 20:28:04,446 ERROR [main]
>> Error:
>> org.biojava.bio.BioException: Failed to read Genbank sequence
>> at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:157)
>> at rnaiserver.calculation.TestRun.downloadSequences(TestRun.java:237)
>> at rnaiserver.calculation.TestRun.main(TestRun.java:40)
>> Caused by: org.biojava.bio.BioException: Could not read sequence
>> at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112)
>> at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:153)
>> ... 2 more
>> Caused by: java.lang.RuntimeException: Error while trying to call new
>> class org.biojavax.SimpleDocRef(class java.util.ArrayList,class
>> java.lang.String,class java.lang.String)
>> at
>> org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder.buildObject(BioSQLRichObjectBuilder.java:156)
>> at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:104)
>> at
>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:385)
>> at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109)
>> ... 3 more
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:585)
>> at
>> org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder.buildObject(BioSQLRichObjectBuilder.java:123)
>> ... 6 more
>> Caused by: org.hibernate.exception.GenericJDBCException: could not
>> execute query
>> at
>> org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:91)
>> at
>> org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:79)
>> at
>> org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
>> at org.hibernate.loader.Loader.doList(Loader.java:2148)
>> at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2029)
>> at org.hibernate.loader.Loader.list(Loader.java:2024)
>> at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:375)
>> at
>> org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:308)
>> at
>> org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:153)
>> at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1129)
>> at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)
>> at
>> org.hibernate.impl.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:749)
>> ... 11 more
>> Caused by: org.postgresql.util.PSQLException: ERROR: current transaction
>> is aborted, commands ignored until end of transaction block
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:437)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:257)
>> at
>> org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:139)
>> at org.hibernate.loader.Loader.getResultSet(Loader.java:1669)
>> at org.hibernate.loader.Loader.doQuery(Loader.java:662)
>> at
>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:224)
>> at org.hibernate.loader.Loader.doList(Loader.java:2145)
>> ... 19 more
>>
>>
>>
>>
>>
>>
>> 'old' exception with Biojava-live from March 2006:
>>
>> 2006-07-13 20:22:08,425 ERROR [main]
>> Error:
>> org.biojava.bio.BioException: Failed to read Genbank sequence
>> at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:156)
>> at rnaiserver.calculation.TestRun.downloadSequences(TestRun.java:237)
>> at rnaiserver.calculation.TestRun.main(TestRun.java:40)
>> Caused by: org.biojava.bio.BioException: Could not read sequence
>> at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>> at
>> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:152)
>> ... 2 more
>> Caused by: java.lang.IllegalArgumentException: Authors string cannot be null
>> at
>> org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75)
>> at
>> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:323)
>> at
>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>> ... 3 more
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
More information about the Biojava-l
mailing list