<div dir="ltr">Hi Toorn,<div><br></div><div>I can confirm this resulted in an endless loop. I committed a patch for this plus some junit tests for validation. Please see:</div><div><br></div><div><a href="https://github.com/biojava/biojava/issues/282">https://github.com/biojava/biojava/issues/282</a><br></div><div><br></div><div><br></div><div>Also, added documentation to tutorial:</div><div><br></div><div><a href="https://github.com/biojava/biojava-tutorial/blob/master/core/readwrite.md">https://github.com/biojava/biojava-tutorial/blob/master/core/readwrite.md</a><br></div><div><br></div><div><br></div><div>For verification, I just parsed the 10G (gzip compressed) TREMBL fasta file with <= 100M max memory.<br></div><div><br></div><div>If you update your code, this should start working for you now. </div><div><br></div><div>Andreas</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 17, 2015 at 2:26 AM, Toorn, H.W.P. van den (Henk) <span dir="ltr"><<a href="mailto:h.w.p.vandentoorn@uu.nl" target="_blank">h.w.p.vandentoorn@uu.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Andreas, thanks very much. I've compiled some (working) code to
illustrate how I think this should work. The artificial sample fasta
file contains only one sequence:<br>
<br>
<br>
<br>
---------------<br>
>test test<br>
PEPTIDEK<br>
<br>
---------------<br>
If you use a larger FASTA file, the file is first parsed correctly,
but when it finishes, the loop just continues. I'm aware I'm
probably doing something wrong in my code, but to me it's just not
clear how to do it correctly, and that's basically my question.<br>
<br>
The code below loops forever, the output is repeating this:<br>
<br>
--------------<br>
11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader
- Can't parse sequence 12. Got sequence of length 0!<br>
11:18:56 [main] WARN org.biojava.nbio.core.sequence.io.FastaReader
- header: test test<br>
test test<br>
---------------<br>
<br>
package nl.hecklab.bioinformatics.fastafilereaderexample;<br>
<br>
import java.io.IOException;<br>
import java.io.InputStream;<br>
import java.util.LinkedHashMap;<br>
import java.util.logging.Level;<br>
import java.util.logging.Logger;<br>
import org.biojava.nbio.core.sequence.ProteinSequence;<br>
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;<br>
import org.biojava.nbio.core.sequence.compound.AminoAcidCompoundSet;<br>
import org.biojava.nbio.core.sequence.io.FastaReader;<br>
import org.biojava.nbio.core.sequence.io.GenericFastaHeaderParser;<br>
import org.biojava.nbio.core.sequence.io.ProteinSequenceCreator;<br>
<br>
/**<br>
*<br>
* @author toorn101<br>
*/<br>
public class App {<br>
<br>
public App() {<br>
try {<br>
InputStream inStream =
this.getClass().getResourceAsStream("/test.fasta");<br>
FastaReader<ProteinSequence, AminoAcidCompound>
fastaReader = new FastaReader<>(<br>
inStream,<br>
new GenericFastaHeaderParser<ProteinSequence,
AminoAcidCompound>(),<br>
new
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));<br>
LinkedHashMap<String, ProteinSequence> b;<br>
while ((b = fastaReader.process(10)) != null) {<br>
for (String seq : b.keySet()) {<br>
System.out.println(seq);<br>
}<br>
}<br>
} catch (IOException ex) {<br>
Logger.getLogger(App.class.getName()).log(Level.SEVERE,
null, ex);<br>
}<br>
}<br>
<br>
public static void main(String[] args) {<br>
new App();<div><div class="h5"><br>
}<br>
<br>
}<br>
<br>
<br>
<div>On 6/17/2015 7:04 AM, Andreas Prlic
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Henk,
<div><br>
</div>
<div>Do you want to share some code-snippets so we can help you
debug?</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Andreas</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jun 15, 2015 at 1:58 AM, Toorn,
H.W.P. van den (Henk) <span dir="ltr"><<a href="mailto:h.w.p.vandentoorn@uu.nl" target="_blank">h.w.p.vandentoorn@uu.nl</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Dear List,<br>
<br>
I've just started using BioJava 4.0.0 in my projects, and
wanted to ask a question about parsing large Fasta files.
There is the option to read parts of the fasta file.<br>
<br>
FastaReader.process(number)<br>
<br>
The problem I have is that it's not documented what happens
if the file is read in its entirety. I was expecting a null
or an empty map, or even some exception, but none happened
and the parser kept on producing (empty) sequences.<br>
<br>
Could anyone enlighten me? I'm probably missing the point
here. Maybe there is a better way to do this (there used to
be the SequenceIterator if I remember correctly, but I can't
find that in version 4.0).<br>
<br>
<br>
<br>
Regards, Henk<br>
<br>
My setup: windows 7 64-bit, java 1.8.0_45 64 bit, BioJava
4.0.0 via Maven.<span><font color="#888888"><br>
-- <br>
<br>
</font></span><br>
_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br></div>
</div>
</blockquote>
<br>
</div></div><div>-- <br>
<br>
</div>
</div>
</blockquote></div><br><br clear="all"><div><br></div>
</div></div>