[Biojava-l] readFasta problem

Wed Apr 21 11:18:24 UTC 2010

On Thu, 8 Apr 2010 12:41:25 +0100
Richard Holland <holland at eaglegenomics.com> wrote:

> You have passed null into the tokenizer parameter of
> RichSequence.IOTools.readFasta() - this is not allowed. The parser
> cannot guess the type of sequence, it must be told what to expect by
> specifying the tokenizer to use. (Importantly this also means that
> you cannot mix different types of sequence within the same file to be
> parsed.)
> 

Thank you. 

Q1:
Does RichSequenceIterator read the complete file in memory and then I
retrieve each read from memory? Or does it read the file line by line
and I get each read?

Q2:
Why am I not able to retrieve the header from the following fasta file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttcccccccccccccccccccccca

with the following code:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.BioException;
import org.biojava.bio.seq.io.SymbolTokenization;
import org.biojava.bio.symbol.AlphabetManager;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;

public class SortFasta {

  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    BufferedReader br = new BufferedReader(new
    FileReader("sortFasta.fasta")); String type = "DNA";
    SymbolTokenization toke = AlphabetManager.alphabetForName(type)
					.getTokenization("token");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, toke,
    null);

    while (rsi.hasNext()) {
      RichSequence rs = rsi.nextRichSequence();
      System.out.println(rs.getDescription());
      System.out.println(rs.seqString());
    }
  }
}

What did I wrong in order to retrieve the header?