[Biojava-l] Parsing GenBank files in Threads

Hoebeke Mark Mark.Hoebeke at jouy.inra.fr
Fri Apr 2 01:41:33 EST 2004


Hi Matthew,

I just finished some further investigation, strengthening my feeling
that using SeqIOTools.readGenbank() might not be thread-safe.

The strongest point is that the errors appear less frequently on
uniprocessor machines that on multiprocessor ones.

As you requested, below is a snippet  of the the exception stack whith 
the bio.* related part delimited by ===========. This pattern repeats
itself for different Genbank files except for the actual value of the
corrupt(?) index.

Note that the problem is solved by prefixing the method calling
Sequence.seqString() with static synchronized, but that takes all the
fun out of the pipeline ;)

If needed, I can hand you the complete source file but I thought I'd
better not spam biojava-l with it.

Thanks for your support.

Mark


     [java] org.quartz.JobExecutionException: java.lang.Exception:
Unable to extract sequence from entry BA000019 [See nested exception:
java.lang.Exception: Unable to extract sequence from entry BA000019]
     [java] 	at pipeline.jobs.EntryFeeder.execute(EntryFeeder.java:241)
     [java] 	at org.quartz.core.JobRunShell.run(JobRunShell.java:178)
     [java] 	at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:487)
     [java] * Nested Exception (Underlying Cause) ---------------
     [java] java.lang.Exception: Unable to extract sequence from entry
BA000019
     [java] 	at
pipeline.jobs.EntryFeeder.feedEntry(EntryFeeder.java:199)
     [java] 	at pipeline.jobs.EntryFeeder.execute(EntryFeeder.java:234)
     [java] 	at org.quartz.core.JobRunShell.run(JobRunShell.java:178)
     [java] 	at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:487)
===========================================================================================================
     [java] java.lang.ArrayIndexOutOfBoundsException: -17406
     [java] 	at
org.biojava.bio.symbol.PackedSymbolList.symbolAt(PackedSymbolList.java:275)
     [java] 	at
org.biojava.bio.seq.io.ChunkedSymbolListFactory$ChunkedSymbolList.symbolAt(ChunkedSymbolListFactory.java:178)
     [java] 	at
org.biojava.bio.symbol.AbstractSymbolList$SymbolIterator.next(AbstractSymbolList.java:191)
     [java] 	at
org.biojava.bio.seq.io.CharacterTokenization.tokenizeSymbolList(CharacterTokenization.java:202)
     [java] 	at
org.biojava.bio.symbol.AlphabetManager$WellKnownTokenizationWrapper.tokenizeSymbolList(AlphabetManager.java:1378)
     [java] 	at
org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:93)
     [java] 	at
org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:89)
=============================================================================================================
     [java] 	at
pipeline.jobs.EntryFeeder.feedEntry(EntryFeeder.java:194)
     [java] 	at pipeline.jobs.EntryFeeder.execute(EntryFeeder.java:234)
     [java] 	at org.quartz.core.JobRunShell.run(JobRunShell.java:178)
     [java] 	at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:487)
     [java] java.lang.Exception: Unable to extract sequence from entry
AE004092
     [java] 	at
pipeline.jobs.EntryFeeder.feedEntry(EntryFeeder.java:199)
     [java] 	at pipeline.jobs.EntryFeeder.execute(EntryFeeder.java:234)
     [java] 	at org.quartz.core.JobRunShell.run(JobRunShell.java:178)
     [java] 	at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:487)
     [java] 2 avr. 2004 08:12:54 org.quartz.core.JobRunShell run


Le jeu 01/04/2004 à 21:31, Matthew Pocock a écrit :
> Hi,
> 
> The biojava policy on synchronization is that we try to make things safe 
> if possible, but expect the user to synchronize sanely. Unfortunately, 
> this is usually not documented anywhere. I could not guarantee that 
> GenbankFormat is threadsafe - it would be sensible for it to be, but the 
> particular implementation may not be. To help us track this, could you 
> include some example stack traces of eratic behavior?
> 
> Matthew

-- 
--------------------------Mark.Hoebeke at jouy.inra.fr----------------------
Unité Statistique & Génome                                      Unité MIG
+33 (0)1 60 87 38 03                   Tél.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                   Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses              INRA - Domaine de Vilvert
F - 91000 Evry                              F - 78352 Jouy-en-Josas CEDEX




More information about the Biojava-l mailing list