[Biojava-l] reading paptides from a fasta file

Gerster Sarah sgerster at student.ethz.ch
Wed Nov 8 08:14:39 UTC 2006


Hi!

I'm trying to read peptides from a fasta file:
>id|0|0.9992|1
ASITENGGAEEESVAK
>id|1|0.9953|1
ASITENGGAEEESVAK
>id|2|0.9998|1
ASNASSAGDEVDNVATSSK
>id|3|0.9998|1
EAAAAEEPQPSDEGDVVAK
>id|4|0.9998|1
EAAAAEEPQPSDEGDVVAK
....
I would like to have all peptides somewhere in the memory. I need, their id, the sequence and the 2 numbers at the end (e.g. id = 0, probability = 0.9992, rank = 1 for the first entry in the file).

I tried to use readFastaProtein... but I guess I don't use it right. Anyway, I get the sequences, but I don't get any of the other infomations I want... 

Here is my code:
try
{
  BufferedReader br = new BufferedReader(new FileReader(file_name));
  RichSequenceIterator rich_stream = RichSequence.IOTools.readFastaProtein(br,null);
  while(rich_stream.hasNext())
  {
    RichSequence rich_seq = rich_stream.nextRichSequence();
    System.out.println(rich_seq.toString());
    System.out.println(rich_seq.getAccession());
    System.out.println(rich_seq.getAlphabet());
    System.out.println(rich_seq.getAnnotation());
    System.out.println(rich_seq.getName());
    System.out.println(rich_seq.getDescription());
    System.out.println(rich_seq.getIdentifier());
    System.out.println(rich_seq.seqString());
  }     
}
catch(Exception e) 
{
  System.err.println("Bug while reading the sequences from the FASTA file"); 
} 
 

Here's the output (for the first entry in the fasta file):
id|0:1/0.9992
0
org.biojava.bio.symbol.AlphabetManager$ImmutableWellKnownAlphabetWrapper at 1df073d

1
null
null
ASITENGGAEEESVAK


Can anyone tell me what's going wrong? 
Is there already a function to put all the sequences directly in the memory (like a HashSet) while reading them?

Cheers

Sarah




More information about the Biojava-l mailing list