[Biojava-l] Reading and writting Fastq files

xyz mitlox at op.pl
Tue Mar 30 11:50:47 UTC 2010


Thank you it works, but after I extended the code with 
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription());
in order to get also a trimmed fasta file I got the following error:

Fastq2Fasta.java:51: cannot
find symbol symbol  : method
writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
location: class org.biojavax.bio.seq.RichSequence.IOTools
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription()); 1 error

Complete Code:
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqVariant;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;


public class Fastq2Fasta {

  public static void main(String[] args) throws FileNotFoundException,
  IOException {

    FileInputStream inputFastq = new
    FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
    IlluminaFastqReader();

    FileOutputStream outputFastq = new
    FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
    new IlluminaFastqWriter();

    SimpleNamespace ns = new SimpleNamespace("biojava");

    FileOutputStream outputFasta = new
    FileOutputStream("fastq2fastaTrim.fasta");


    for (Fastq fastq : qReader.read(inputFastq)) {
      System.out.println(fastq.getDescription());
      System.out.println(fastq.getSequence());
      String trimSeq = fastq.getSequence().substring(0,
    fastq.getSequence().length() - 6); System.out.println(trimSeq);
      System.out.println(fastq.getQuality());
      String trimQual = fastq.getQuality().substring(0,
    fastq.getQuality().length() - 6); System.out.println(trimQual);

      FastqBuilder trimFastq = new FastqBuilder();
      trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA);
      trimFastq.withDescription(fastq.getDescription());
      trimFastq.appendSequence(trimSeq);
      trimFastq.appendQuality(trimQual);

      qWriter.write(outputFastq, trimFastq.build());
      
      RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
      fastq.getDescription());


    }
  }
}

What did I wrong?

Suggestions:
1) 
After I trimmed the fastq files the header information for quality
is empty

@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
+
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

this reduced the size of the files but is it compatible with
SOAP and TopHat?

2)
I was using fastq files up to 6 GBytes and I have not run any benchmarks
with different Buffer/stream combination on big text files and therefore
I am not sure that is enough to use just FileInputStream or
FileOutputStream. BioJavaX is using BufferedReader br = new
BufferedReader(new FileReader()) are there any speed difference?

Overall I think the API looks good and for doc you could use this code
and put it on BioJava.


On Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Michael Heuer wrote:

> 
> FastqBuilder defaults to the Sanger variant, see
> 
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT
> 
> 
> In your code, you just need to specify the Illumina variant
> 
> FastqBuilder trimFastq = new FastqBuilder()
>   .withVariant(FastqVariant.FASTQ_ILLUMINA)
>   .withDescription(fastq.getDescription())
>   .appendSequence(trimSeq)
>   .appendQuality(trimQual);
> 
> 
> Please let me know if you have any API or doc suggestions, as this
> stuff has not been used much by anyone other than myself.
> 
>    michael
> 
> 
> 




More information about the Biojava-l mailing list