[Biojava-l] Reading and writting Fastq files
xyz
mitlox at op.pl
Tue Mar 30 11:50:47 UTC 2010
Thank you it works, but after I extended the code with
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription());
in order to get also a trimmed fasta file I got the following error:
Fastq2Fasta.java:51: cannot
find symbol symbol : method
writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
location: class org.biojavax.bio.seq.RichSequence.IOTools
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription()); 1 error
Complete Code:
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqVariant;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
public class Fastq2Fasta {
public static void main(String[] args) throws FileNotFoundException,
IOException {
FileInputStream inputFastq = new
FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
IlluminaFastqReader();
FileOutputStream outputFastq = new
FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
new IlluminaFastqWriter();
SimpleNamespace ns = new SimpleNamespace("biojava");
FileOutputStream outputFasta = new
FileOutputStream("fastq2fastaTrim.fasta");
for (Fastq fastq : qReader.read(inputFastq)) {
System.out.println(fastq.getDescription());
System.out.println(fastq.getSequence());
String trimSeq = fastq.getSequence().substring(0,
fastq.getSequence().length() - 6); System.out.println(trimSeq);
System.out.println(fastq.getQuality());
String trimQual = fastq.getQuality().substring(0,
fastq.getQuality().length() - 6); System.out.println(trimQual);
FastqBuilder trimFastq = new FastqBuilder();
trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA);
trimFastq.withDescription(fastq.getDescription());
trimFastq.appendSequence(trimSeq);
trimFastq.appendQuality(trimQual);
qWriter.write(outputFastq, trimFastq.build());
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription());
}
}
}
What did I wrong?
Suggestions:
1)
After I trimmed the fastq files the header information for quality
is empty
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
+
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
this reduced the size of the files but is it compatible with
SOAP and TopHat?
2)
I was using fastq files up to 6 GBytes and I have not run any benchmarks
with different Buffer/stream combination on big text files and therefore
I am not sure that is enough to use just FileInputStream or
FileOutputStream. BioJavaX is using BufferedReader br = new
BufferedReader(new FileReader()) are there any speed difference?
Overall I think the API looks good and for doc you could use this code
and put it on BioJava.
On Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Michael Heuer wrote:
>
> FastqBuilder defaults to the Sanger variant, see
>
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT
>
>
> In your code, you just need to specify the Illumina variant
>
> FastqBuilder trimFastq = new FastqBuilder()
> .withVariant(FastqVariant.FASTQ_ILLUMINA)
> .withDescription(fastq.getDescription())
> .appendSequence(trimSeq)
> .appendQuality(trimQual);
>
>
> Please let me know if you have any API or doc suggestions, as this
> stuff has not been used much by anyone other than myself.
>
> michael
>
>
>
More information about the Biojava-l
mailing list