[Biojava-l] readFasta problem

Richard Holland holland at eaglegenomics.com
Wed Apr 21 11:29:57 UTC 2010


On 21 Apr 2010, at 12:18, xyz wrote:

> On Thu, 8 Apr 2010 12:41:25 +0100
> Richard Holland <holland at eaglegenomics.com> wrote:
> 
>> You have passed null into the tokenizer parameter of
>> RichSequence.IOTools.readFasta() - this is not allowed. The parser
>> cannot guess the type of sequence, it must be told what to expect by
>> specifying the tokenizer to use. (Importantly this also means that
>> you cannot mix different types of sequence within the same file to be
>> parsed.)
>> 
> 
> Thank you. 
> 
> Q1:
> Does RichSequenceIterator read the complete file in memory and then I
> retrieve each read from memory? Or does it read the file line by line
> and I get each read?


Line by line.

> Q2:
> Why am I not able to retrieve the header from the following fasta file:
>> 1
> atccccc
>> 2
> atccccctttttt
>> 3
> atccccccccccccccccctttt
>> 4
> tttttttccccccccccccccccccccccc
>> 5
> tttttttcccccccccccccccccccccca
> 
> with the following code:
> 
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
> import org.biojava.bio.seq.io.SymbolTokenization;
> import org.biojava.bio.symbol.AlphabetManager;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
> 
> public class SortFasta {
> 
>  public static void main(String[] args) throws FileNotFoundException,
>  BioException {
> 
> 
>    BufferedReader br = new BufferedReader(new
>    FileReader("sortFasta.fasta")); String type = "DNA";
>    SymbolTokenization toke = AlphabetManager.alphabetForName(type)
> 					.getTokenization("token");
> 
> 
>    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, toke,
>    null);
> 
>    while (rsi.hasNext()) {
>      RichSequence rs = rsi.nextRichSequence();
>      System.out.println(rs.getDescription());
>      System.out.println(rs.seqString());
>    }
>  }
> }
> 
> What did I wrong in order to retrieve the header?


Try the other methods on RichSequence - getName() for instance.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the Biojava-l mailing list