[Biojava-l] Fasta parsing question

Toorn, H.W.P. van den (Henk) h.w.p.vandentoorn at uu.nl
Wed Jun 17 09:26:05 UTC 2015


Hi Andreas, thanks very much. I've compiled some (working) code to 
illustrate how I think this should work. The artificial sample fasta 
file contains only one sequence:



---------------
 >test test
PEPTIDEK

---------------
If you use a larger FASTA file, the file is first parsed correctly, but 
when it finishes, the loop just continues. I'm aware I'm probably doing 
something wrong in my code, but to me it's just not clear how to do it 
correctly, and that's basically my question.

The code below loops forever, the output is repeating this:

--------------
11:18:56 [main] WARN  org.biojava.nbio.core.sequence.io.FastaReader - 
Can't parse sequence 12. Got sequence of length 0!
11:18:56 [main] WARN  org.biojava.nbio.core.sequence.io.FastaReader - 
header: test test
test test
---------------

package nl.hecklab.bioinformatics.fastafilereaderexample;

import java.io.IOException;
import java.io.InputStream;
import java.util.LinkedHashMap;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompound;
import org.biojava.nbio.core.sequence.compound.AminoAcidCompoundSet;
import org.biojava.nbio.core.sequence.io.FastaReader;
import org.biojava.nbio.core.sequence.io.GenericFastaHeaderParser;
import org.biojava.nbio.core.sequence.io.ProteinSequenceCreator;

/**
  *
  * @author toorn101
  */
public class App {

     public App() {
         try {
             InputStream inStream = 
this.getClass().getResourceAsStream("/test.fasta");
             FastaReader<ProteinSequence, AminoAcidCompound> fastaReader 
= new FastaReader<>(
                     inStream,
                     new GenericFastaHeaderParser<ProteinSequence, 
AminoAcidCompound>(),
                     new 
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet()));
             LinkedHashMap<String, ProteinSequence> b;
             while ((b = fastaReader.process(10)) != null) {
                 for (String seq : b.keySet()) {
                     System.out.println(seq);
                 }
             }
         } catch (IOException ex) {
             Logger.getLogger(App.class.getName()).log(Level.SEVERE, 
null, ex);
         }
     }

     public static void main(String[] args) {
         new App();
     }

}


On 6/17/2015 7:04 AM, Andreas Prlic wrote:
> Hi Henk,
>
> Do you want to share some code-snippets so we can help you debug?
>
> Thanks,
>
> Andreas
>
>
>
> On Mon, Jun 15, 2015 at 1:58 AM, Toorn, H.W.P. van den (Henk) 
> <h.w.p.vandentoorn at uu.nl <mailto:h.w.p.vandentoorn at uu.nl>> wrote:
>
>     Dear List,
>
>     I've just started using BioJava 4.0.0 in my projects, and wanted
>     to ask a question about parsing large Fasta files. There is the
>     option to read parts of the fasta file.
>
>     FastaReader.process(number)
>
>     The problem I have is that it's not documented what happens if the
>     file is read in its entirety. I was expecting a null or an empty
>     map, or even some exception, but none happened and the parser kept
>     on producing (empty) sequences.
>
>     Could anyone enlighten me? I'm probably missing the point here.
>     Maybe there is a better way to do this (there used to be the
>     SequenceIterator if I remember correctly, but I can't find that in
>     version 4.0).
>
>
>
>     Regards, Henk
>
>     My setup: windows 7 64-bit, java 1.8.0_45 64 bit, BioJava 4.0.0
>     via Maven.
>     -- 
>
>
>     _______________________________________________
>     Biojava-l mailing list  - Biojava-l at mailman.open-bio.org
>     <mailto:Biojava-l at mailman.open-bio.org>
>     http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> RCSB PDB Protein Data Bank
> Technical & Scientific Team Lead
> University of California, San Diego
>
> Editor Software Section
> PLOS Computational Biology
>
> BioJava Project Lead
> -----------------------------------------------------------------------

-- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150617/568c3310/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h_w_p_vandentoorn.vcf
Type: text/x-vcard
Size: 295 bytes
Desc: h_w_p_vandentoorn.vcf
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150617/568c3310/attachment.vcf>


More information about the Biojava-l mailing list