[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes
Michael Heuer
heuermh at gmail.com
Mon Feb 27 04:11:12 UTC 2012
Hello Hannes,
Nice work!
All those test cases were generated to ensure that FASTQ support was identical among all the various bio* projects. That was the whole point of the paper and of the current code.
We need to maintain this support, even if Illumina did change formats. I can review your other changes this week, although with your reformatting it might be hard to see the diff.
michael
On Feb 26, 2012, at 4:59 PM, Hannes Brandstätter-Müller<biojava at hannes.oib.com> wrote:
> Hi (now) fellow devs :)
>
> So, I spent some time this weekend on the
> org.biojava3.sequencing.io.fastq classes.
>
> This is what I did:
>
> *) the code formatting was not like in the other files, my automatic
> formatting in the IDE changed that to the "normal java standard", I
> hope noone feels offended by that.
> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
> Formatting (according to Wikipedia version 1.8, new this month
> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
> contradictory, some say it's directly Sanger, some say it has Phred
> values up to 41?)
> *) I extended the Fastq class to be able to generate DNASequence
> representations with the Quality (as Phred Numbers) added as Feature
> (QualityFeature, also new)
> *) I extended the Fastq class to have a contructor that accepts a
> DNASequence (if the quality feature is present; might need a bit more
> refinement there)
> *) as a consequence, Fastq can now translate between the Fastq
> variants and the Phred Fasta/Qual file format (I'll add a dedicated
> parser/Fastq constructor or reader/writer for that format later, but
> that's rather trivial)
>
> Fastq sangerfastq = new Fastq("description", "ACGTA",
> "I?5+\"", FastqVariant.FASTQ_SANGER);
> DNASequence dnaSequence = sangerfastq.getDNASequence();
> // dnaSequence has the Phred qualities [40 30 20 10 1]
> Fastq illuminafastq = new Fastq(dnaSequence,
> FastqVariant.FASTQ_ILLUMINA);
> // assertEquals("h^TJA", illuminafastq.getQuality());
> dnaSequence = illuminafastq.getDNASequence();
> Fastq solexafastq = new Fastq(dnaSequence, FastqVariant.FASTQ_SOLEXA);
> // assertEquals("h^TJ;", solexafastq.getQuality());
>
> *) I have added some test cases for my code, but I might have lowered
> the awesome test coverage in that module. Was that generated by hand
> or by some tool?
>
> I hope someone else will find that useful (at least we can boast Fastq
> support now; someone add that to the Fastq wiki page once we release
> 3.0.3!)
>
> Hannes
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list