[Bioperl-l] fastq index

Mon Jan 31 19:16:54 UTC 2011

Just a quick note that I made a small change to the Bio::Index::Fastq parser in bioperl-live which appears to fix this.  We probably need a more robust way of finding the start of the FASTQ header (I think '@' can be a qual line symbol, so just searching for this may bite us down the road).

chris

On Dec 31, 2010, at 9:28 AM, Chris Fields wrote:

> Caleb,
> 
> Yes that would be a bug.  I posted this to bugzilla for tracking:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=3165
> 
> chris
> 
> On Dec 31, 2010, at 12:47 AM, Davis, Caleb F wrote:
> 
>> Thank you for the rec!
>> 
>> Here's what I get with 1.6.1: 
>> 
>> %perl make_fq_inx_test.pl test.inx test.fastq
>> %perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB
>> 
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: No description line parsed
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
>> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
>> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
>> STACK: fetch_fastq_test.pl:11
>> -----------------------------------------------------------
>> 
>> Is it a bug?
>> --Caleb
>> 
>> These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>> 
>> ##########  make_fq_inx_test.pl  ###########
>> # Complete code for making an index for several
>> # fastq files
>> use Bio::Index::Fastq;
>> use strict;
>> 
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fastq->new(
>> 	'-filename' => $Index_File_Name,
>> 	'-write_flag' => 1);
>> $inx->make_index(@ARGV);
>> 
>> 
>> ##########  fetch_fastq_test.pl  ###########
>> # Print out several sequences present in the index
>> # in Fastq format
>> use Bio::Index::Fastq;
>> use strict;
>> 
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name);
>> my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT);
>> 
>> foreach my $id (@ARGV) {
>> 	my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object   <-------------------  THROW
>> 	$out->write_seq($seq);
>> }
>> 
>> Example data--
>> 
>> ##########  test.fastq  ###########
>> @FVBWUVC01BR7MP
>> GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG
>> +
>> 24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232
>> @FVBWUVC01D7NSE
>> GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG
>> +
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@<
>> @FVBWUVC01D7SUB
>> TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT
>> +
>> FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757
>> @FVBWUVC01BFN75
>> TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA
>> +
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944
>> @FVBWUVC01AYO0N
>> AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC
>> +
>> IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12
>> @FVBWUVC01EYPM9
>> GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA
>> +
>> FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/
>> @FVBWUVC01BWHY4
>> AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT
>> +
>> HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;..
>> @FVBWUVC01ELH7H
>> CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT
>> +
>> IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>==<HGD;
>> @FVBWUVC01CTTAY
>> CTCGAGATTCTGGATCCTCATGGACAAGATGTTCTCCGGCTTAGAGAT
>> +
>> FFFFFFFFFFFFDA:88@>>>44444898==<;<62444221775557
>> 
>> 
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu] 
>> Sent: Wednesday, December 29, 2010 9:35 PM
>> To: Cook, Malcolm
>> Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] fastq index
>> 
>> May just wrap this for the indexer.  Thanks for the pointer Malcolm!
>> 
>> chris
>> 
>> On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote:
>> 
>>> If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/
>>> 
>>> No bioperl wrapper, but, hey, that's what `system` is for
>>> 
>>> Cheers,
>>> 
>>> Malcolm
>>> 
>>> 
>>> On 12/29/10 2:28 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
>>> 
>>> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here:
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>>>> using the fastq sequence given here:
>>>> http://www.bioperl.org/wiki/FASTQ_sequence_format
>>>> but I get this error:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113
>>>> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134
>>>> STACK: fetch_fastq_test.pl:11
>>>> 
>>>> The only other report of this behavior I could find is here:
>>>> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836
>>>> 
>>>> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here.
>>>> 
>>>> --Caleb
>>> 
>>> Caleb,
>>> 
>>> Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version.  If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem.
>>> 
>>> I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this.  Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct).
>>> 
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l