[Biojava-dev] [BioJava - Bug #3305] (New) SequenceFileProxyLoader repeatedly opens file and never closes

Scooter Willis HWillis at scripps.edu
Wed Oct 12 17:02:57 UTC 2011


John

Thanks for catching that and the fix. I checked in the fix.

Looks like you are loading a large collection of uniprot proteins. How is
performance/memory usage? If you are only picking out a selection of
protein sequences then memory won't be an issue. I had on my list to add
some memory management features to SequenceFileProxyLoader that would keep
track of the number of sequences open and free up memory for storage of
sequence based on oldest to be accessed. User could set the maximum number
of sequences to have open. If you get a sequence that had memory
deallocated it would simply reload it. Let me know if you are running into
memory problems for your particular application and I can find some time
to add in the memory management code.

Thanks

Scooter

On 10/12/11 12:29 PM, "redmine at redmine.open-bio.org"
<redmine at redmine.open-bio.org> wrote:

>
>Issue #3305 has been reported by John May.
>
>----------------------------------------
>Bug #3305: SequenceFileProxyLoader repeatedly opens file and never closes
>https://redmine.open-bio.org/issues/3305
>
>Author: John May
>Status: New
>Priority: Normal
>Assignee: biojava-dev list
>Category: seq.io
>Target version: live (SVN source)
>URL: 
>
>
>Brief: SequenceFileProxyLoader continuously reopens file and never closes
>thus eventually throwing a FileNotFoundException as too many files have
>been opened. This only occurs when requesting tens of thousands of
>sequences however with the provided fix the proxy reader works as
>expected.
>Operating System: OS X 10.6
>JDK: 1.6
>
>Exception:
><pre>
>Exception in thread "main" org.biojava3.core.exceptions.FileAccessError:
>Error accessing /databases/uniprot/uniprot_sprot.fasta offset=42133810
>sequenceLength=290 java.io.FileNotFoundException:
>/databases/uniprot/uniprot_sprot.fasta (Too many open files)
>	at 
>org.biojava3.core.sequence.loader.SequenceFileProxyLoader.init(SequenceFil
>eProxyLoader.java:105)
>	at 
>org.biojava3.core.sequence.loader.SequenceFileProxyLoader.iterator(Sequenc
>eFileProxyLoader.java:246)
>	at 
>org.biojava3.core.sequence.template.AbstractSequence.iterator(AbstractSequ
>ence.java:583)
>	at 
>org.biojava3.core.sequence.template.SequenceMixin.toStringBuilder(Sequence
>Mixin.java:158)
>	at 
>org.biojava3.core.sequence.template.SequenceMixin.toString(SequenceMixin.j
>ava:169)
>	at 
>org.biojava3.core.sequence.template.AbstractSequence.getSequenceAsString(A
>bstractSequence.java:521)
>	at 
>org.biojava3.core.sequence.io.FastaWriter.process(FastaWriter.java:103)
>	at 
>org.biojava3.core.sequence.io.FastaWriterHelper.writeProteinSequence(Fasta
>WriterHelper.java:77)
>	at 
>org.biojava3.core.sequence.io.FastaWriterHelper.writeProteinSequence(Fasta
>WriterHelper.java:59)
></pre>
>
>
>
>
>How to repeat (need uniprot_sprot.fa):
><pre>
>File sprotFasta = new File("path/to/uniprot_sprot.fa");
>FastaReader<ProteinSequence, AminoAcidCompound> fastaReader
>= new FastaReader(new FileInputStream(sprotFasta),
>                              new
>GenericFastaHeaderParser<ProteinSequence, AminoAcidCompound>(),
>                              new
>FileProxyProteinSequenceCreator(sprotFasta, new AminoAcidCompoundSet()));
>
>Map<String, ProteinSequence> sprotMap = fastaReader.process();
>FastaWriterHelper.writeProteinSequence(File.createTempFile("output",
>".fa"), sprotMap.values()); // this is just to demonstrate the bug
></pre>
>
>
>Fix (org.biojava3.core.sequence.loader.SequenceFileProxyLoader: line
>98-108) Also see diff file
><pre>
>    private boolean init() {
>        try {
>            RandomAccessFile randomAccessFile = new
>RandomAccessFile(file, "r");
>            randomAccessFile.seek(sequenceStartIndex);
>            String sequence =
>sequenceParser.getSequence(randomAccessFile, sequenceLength);
>            setContents(sequence);
>            randomAccessFile.close(); // close file to prevent too many
>being open
>        } catch (Exception e) {
>            throw new FileAccessError("Error accessing " + file + "
>offset=" + sequenceStartIndex + " sequenceLength=" + sequenceLength + " "
>+ e.toString());
>        }
>        return true;
>    }
></pre>
>
>
>
>-- 
>You have received this notification because you have either subscribed to
>it, or are involved in it.
>To change your notification preferences, please click here and login:
>http://redmine.open-bio.org
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list