[Bioperl-l] fastq index

Chris Fields cjfields at illinois.edu
Fri Dec 31 15:28:01 UTC 2010


Caleb,

Yes that would be a bug.  I posted this to bugzilla for tracking:

http://bugzilla.open-bio.org/show_bug.cgi?id=3165

chris

On Dec 31, 2010, at 12:47 AM, Davis, Caleb F wrote:

> Thank you for the rec!
> 
> Here's what I get with 1.6.1: 
> 
> %perl make_fq_inx_test.pl test.inx test.fastq
> %perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: No description line parsed
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
> STACK: fetch_fastq_test.pl:11
> -----------------------------------------------------------
> 
> Is it a bug?
> --Caleb
> 
> These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
> 
> ##########  make_fq_inx_test.pl  ###########
> # Complete code for making an index for several
> # fastq files
> use Bio::Index::Fastq;
> use strict;
> 
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fastq->new(
> 	'-filename' => $Index_File_Name,
> 	'-write_flag' => 1);
> $inx->make_index(@ARGV);
> 
> 
> ##########  fetch_fastq_test.pl  ###########
> # Print out several sequences present in the index
> # in Fastq format
> use Bio::Index::Fastq;
> use strict;
> 
> my $Index_File_Name = shift;
> my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name);
> my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT);
> 
> foreach my $id (@ARGV) {
> 	my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object   <-------------------  THROW
> 	$out->write_seq($seq);
> }
> 
> Example data--
> 
> ##########  test.fastq  ###########
> @FVBWUVC01BR7MP
> GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG
> +
> 24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232
> @FVBWUVC01D7NSE
> GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG
> +
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@<
> @FVBWUVC01D7SUB
> TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT
> +
> FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757
> @FVBWUVC01BFN75
> TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA
> +
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944
> @FVBWUVC01AYO0N
> AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC
> +
> IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12
> @FVBWUVC01EYPM9
> GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA
> +
> FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/
> @FVBWUVC01BWHY4
> AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT
> +
> HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;..
> @FVBWUVC01ELH7H
> CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT
> +
> IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>==<HGD;
> @FVBWUVC01CTTAY
> CTCGAGATTCTGGATCCTCATGGACAAGATGTTCTCCGGCTTAGAGAT
> +
> FFFFFFFFFFFFDA:88@>>>44444898==<;<62444221775557
> 
> 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu] 
> Sent: Wednesday, December 29, 2010 9:35 PM
> To: Cook, Malcolm
> Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] fastq index
> 
> May just wrap this for the indexer.  Thanks for the pointer Malcolm!
> 
> chris
> 
> On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote:
> 
>> If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/
>> 
>> No bioperl wrapper, but, hey, that's what `system` is for
>> 
>> Cheers,
>> 
>> Malcolm
>> 
>> 
>> On 12/29/10 2:28 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
>> 
>> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote:
>> 
>>> Hi all,
>>> 
>>> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here:
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>>> using the fastq sequence given here:
>>> http://www.bioperl.org/wiki/FASTQ_sequence_format
>>> but I get this error:
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113
>>> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134
>>> STACK: fetch_fastq_test.pl:11
>>> 
>>> The only other report of this behavior I could find is here:
>>> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836
>>> 
>>> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here.
>>> 
>>> --Caleb
>> 
>> Caleb,
>> 
>> Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version.  If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem.
>> 
>> I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this.  Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct).
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list