[Biojava-l] readFasta problem
Richard Holland
holland at eaglegenomics.com
Wed Apr 21 11:29:57 UTC 2010
On 21 Apr 2010, at 12:18, xyz wrote:
> On Thu, 8 Apr 2010 12:41:25 +0100
> Richard Holland <holland at eaglegenomics.com> wrote:
>
>> You have passed null into the tokenizer parameter of
>> RichSequence.IOTools.readFasta() - this is not allowed. The parser
>> cannot guess the type of sequence, it must be told what to expect by
>> specifying the tokenizer to use. (Importantly this also means that
>> you cannot mix different types of sequence within the same file to be
>> parsed.)
>>
>
> Thank you.
>
> Q1:
> Does RichSequenceIterator read the complete file in memory and then I
> retrieve each read from memory? Or does it read the file line by line
> and I get each read?
Line by line.
> Q2:
> Why am I not able to retrieve the header from the following fasta file:
>> 1
> atccccc
>> 2
> atccccctttttt
>> 3
> atccccccccccccccccctttt
>> 4
> tttttttccccccccccccccccccccccc
>> 5
> tttttttcccccccccccccccccccccca
>
> with the following code:
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
> import org.biojava.bio.seq.io.SymbolTokenization;
> import org.biojava.bio.symbol.AlphabetManager;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
>
> public class SortFasta {
>
> public static void main(String[] args) throws FileNotFoundException,
> BioException {
>
>
> BufferedReader br = new BufferedReader(new
> FileReader("sortFasta.fasta")); String type = "DNA";
> SymbolTokenization toke = AlphabetManager.alphabetForName(type)
> .getTokenization("token");
>
>
> RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, toke,
> null);
>
> while (rsi.hasNext()) {
> RichSequence rs = rsi.nextRichSequence();
> System.out.println(rs.getDescription());
> System.out.println(rs.seqString());
> }
> }
> }
>
> What did I wrong in order to retrieve the header?
Try the other methods on RichSequence - getName() for instance.
cheers,
Richard
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the Biojava-l
mailing list