[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Michael Heuer heuermh at gmail.com
Mon Feb 27 04:11:12 UTC 2012


Hello Hannes,

Nice work!

All those test cases were generated to ensure that FASTQ support was identical among all the various bio* projects. That was the whole point of the paper and of the current code.

We need to maintain this support, even if Illumina did change formats. I can review your other changes this week, although with your reformatting it might be hard to see the diff.

   michael


On Feb 26, 2012, at 4:59 PM, Hannes Brandstätter-Müller<biojava at hannes.oib.com> wrote:

> Hi (now) fellow devs :)
> 
> So, I spent some time this weekend on the
> org.biojava3.sequencing.io.fastq classes.
> 
> This is what I did:
> 
> *) the code formatting was not like in the other files, my automatic
> formatting in the IDE changed that to the "normal java standard", I
> hope noone feels offended by that.
> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
> Formatting (according to Wikipedia version 1.8, new this month
> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
> contradictory, some say it's directly Sanger, some say it has Phred
> values up to 41?)
> *) I extended the Fastq class to be able to generate DNASequence
> representations with the Quality (as Phred Numbers) added as Feature
> (QualityFeature, also new)
> *) I extended the Fastq class to have a contructor that accepts a
> DNASequence (if the quality feature is present; might need a bit more
> refinement there)
> *) as a consequence, Fastq can now translate between the Fastq
> variants and the Phred Fasta/Qual file format (I'll add a dedicated
> parser/Fastq constructor or reader/writer for that format later, but
> that's rather trivial)
> 
>        Fastq sangerfastq = new Fastq("description", "ACGTA",
> "I?5+\"", FastqVariant.FASTQ_SANGER);
>        DNASequence dnaSequence = sangerfastq.getDNASequence();
>        // dnaSequence has the Phred qualities [40 30 20 10 1]
>        Fastq illuminafastq = new Fastq(dnaSequence,
> FastqVariant.FASTQ_ILLUMINA);
>        // assertEquals("h^TJA", illuminafastq.getQuality());
>        dnaSequence = illuminafastq.getDNASequence();
>        Fastq solexafastq = new Fastq(dnaSequence, FastqVariant.FASTQ_SOLEXA);
>        // assertEquals("h^TJ;", solexafastq.getQuality());
> 
> *) I have added some test cases for my code, but I might have lowered
> the awesome test coverage in that module. Was that generated by hand
> or by some tool?
> 
> I hope someone else will find that useful (at least we can boast Fastq
> support now; someone add that to the Fastq wiki page once we release
> 3.0.3!)
> 
> Hannes
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev




More information about the biojava-dev mailing list