[Bioperl-l] Update of SeqIO:: fastq Module for PacBio

Dan Nasko dan.nasko at gmail.com
Thu Sep 20 12:47:55 UTC 2012


Hi,

I've recently begun working through some PacBio sequencing data and it has been chocking up current bioperl FASTQ I/O modules. Here are the problems I'm running into:

	[1] PacBio will report quality scores up to 100 - I believe there's an upper limit of 93 and the FASTQ parser will throw and error if that's surpassed.

	[2] Very often PacBio will have one base sequences. e.g.:


	@m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589
	T
	+
	0
	@m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321
	G
	+
	(

	If this one base sequence has a quality character of "0" (quality score 15), shown above, I/O will throw the following error:

	------------- EXCEPTION: Bio::Root::Exception -------------
	MSG: Quality string [0 at m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321] of length [78]
	doesn't match length of sequence T
	[1], line: 86394
	STACK: Error::throw
	STACK: Bio::Root::Root::throw /Library/Perl/5.12/Bio/Root/Root.pm:472
	STACK: Bio::SeqIO::fastq::next_dataset /Library/Perl/5.12/Bio/SeqIO/fastq.pm:102
	STACK: Bio::SeqIO::fastq::next_seq /Library/Perl/5.12/Bio/SeqIO/fastq.pm:29
	STACK: quality_length_filter.pl:146
	-----------------------------------------------------------

	For some reason when it encounters ^0$ on the quality line, it won't see the [\n] and will take up the next sequence's header as quality scores. (i.e. @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 was the name of the next sequence).

Thanks,
Dan



More information about the Bioperl-l mailing list