[Bioperl-l] fastq index
Chris Fields
cjfields at illinois.edu
Mon Jan 31 19:16:54 UTC 2011
Just a quick note that I made a small change to the Bio::Index::Fastq parser in bioperl-live which appears to fix this. We probably need a more robust way of finding the start of the FASTQ header (I think '@' can be a qual line symbol, so just searching for this may bite us down the road).
chris
On Dec 31, 2010, at 9:28 AM, Chris Fields wrote:
> Caleb,
>
> Yes that would be a bug. I posted this to bugzilla for tracking:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=3165
>
> chris
>
> On Dec 31, 2010, at 12:47 AM, Davis, Caleb F wrote:
>
>> Thank you for the rec!
>>
>> Here's what I get with 1.6.1:
>>
>> %perl make_fq_inx_test.pl test.inx test.fastq
>> %perl fetch_fastq_test.pl test.inx FVBWUVC01D7SUB
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: No description line parsed
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
>> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
>> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
>> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
>> STACK: fetch_fastq_test.pl:11
>> -----------------------------------------------------------
>>
>> Is it a bug?
>> --Caleb
>>
>> These perl scripts are from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>>
>> ########## make_fq_inx_test.pl ###########
>> # Complete code for making an index for several
>> # fastq files
>> use Bio::Index::Fastq;
>> use strict;
>>
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fastq->new(
>> '-filename' => $Index_File_Name,
>> '-write_flag' => 1);
>> $inx->make_index(@ARGV);
>>
>>
>> ########## fetch_fastq_test.pl ###########
>> # Print out several sequences present in the index
>> # in Fastq format
>> use Bio::Index::Fastq;
>> use strict;
>>
>> my $Index_File_Name = shift;
>> my $inx = Bio::Index::Fastq->new('-filename' => $Index_File_Name);
>> my $out = Bio::SeqIO->new('-format' => 'Fastq','-fh' => \*STDOUT);
>>
>> foreach my $id (@ARGV) {
>> my $seq = $inx->fetch($id); # Returns Bio::Seq::Quality object <------------------- THROW
>> $out->write_seq($seq);
>> }
>>
>> Example data--
>>
>> ########## test.fastq ###########
>> @FVBWUVC01BR7MP
>> GCGACCCTAGGTAGCAACCGCCGGCTTCGGCGGTAAGGTATCACTCAG
>> +
>> 24<9000988:;<=<;=<44444<<=<<<>???@@@@?>=86662232
>> @FVBWUVC01D7NSE
>> GAAGCAGACACAGAAAGACACGGTCTAGCAGATCG
>> +
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEE@<
>> @FVBWUVC01D7SUB
>> TTTATCGGCTAGGTCAAATAGAGTGCTTTGATATCAGCATGTCTAGCT
>> +
>> FFD===FFFFFHFFFFFFFFFFC888FFFFDDBAAA@@@840...757
>> @FVBWUVC01BFN75
>> TTAGAATTCAGTTTAGTGCGCTGATCTGAGTCGAGATAAAATCACCAGTACCCAAAACCAGGCGGGCTCGCCACGTTGGCTAATCCTGGTACATTTTGTAATCAATGTTCAGAAGA
>> +
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFDDBB:544448<<=>;899<=8889988894<<9955,,/4,,,,,811775512426766777;97668<<44944
>> @FVBWUVC01AYO0N
>> AAATTTGTGTTAGAAGGACGAGTCACCACGTACCAATAGCAACAACGATCGGTCGGACTATTCATTGTGGTGGTGACGCTC
>> +
>> IIIIIIIIIIIIIHHFF@??DA???==<=766<<11,/,,,1,,,,733977--/4444722466<;;<<<82/,,--.12
>> @FVBWUVC01EYPM9
>> GGATTACACGGGAAAGGTGCTTGTGTCCCGACAGGCTAGGATA
>> +
>> FFFFDD<<:ABAA<988:9::BA===BBBBAA??<8623425/
>> @FVBWUVC01BWHY4
>> AGGTACTACTTCTTAGTGAGACAAGTCCTGGACAGGAGCAGGTAATATT
>> +
>> HGGGDDD:555:4449==>=<<555=BBAAAA at 8888894224266;..
>> @FVBWUVC01ELH7H
>> CATGAGAAGTCTTAATATTACCTCTCAGGTACCTCCTCTTAAGACACAATTACAGAAGGTGCT
>> +
>> IIIII@@??GIIIIG<<666:IFEIEIEED<==<;CE?3344IFIIIIIIIIIGC>==<HGD;
>> @FVBWUVC01CTTAY
>> CTCGAGATTCTGGATCCTCATGGACAAGATGTTCTCCGGCTTAGAGAT
>> +
>> FFFFFFFFFFFFDA:88@>>>44444898==<;<62444221775557
>>
>>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, December 29, 2010 9:35 PM
>> To: Cook, Malcolm
>> Cc: Davis, Caleb F; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] fastq index
>>
>> May just wrap this for the indexer. Thanks for the pointer Malcolm!
>>
>> chris
>>
>> On Dec 29, 2010, at 6:20 PM, Cook, Malcolm wrote:
>>
>>> If you're looking for alternatives, I recommend: http://sourceforge.net/projects/cdbfasta/
>>>
>>> No bioperl wrapper, but, hey, that's what `system` is for
>>>
>>> Cheers,
>>>
>>> Malcolm
>>>
>>>
>>> On 12/29/10 2:28 PM, "Chris Fields" <cjfields at illinois.edu> wrote:
>>>
>>> On Dec 29, 2010, at 1:46 PM, Davis, Caleb F wrote:
>>>
>>>> Hi all,
>>>>
>>>> Retrieving fastq from an index with bio::index::fastq is not working for me. I try using the index creation and retrieval code as given here:
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Index/Fastq.html
>>>> using the fastq sequence given here:
>>>> http://www.bioperl.org/wiki/FASTQ_sequence_format
>>>> but I get this error:
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCYC361-11a03.q1k bases 1 to 1576 doesn't match fastq descriptor line type
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::SeqIO::fastq::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/fastq.pm:113
>>>> STACK: Bio::Index::AbstractSeq::fetch /usr/lib/perl5/site_perl/5.8.8/Bio/Index/AbstractSeq.pm:134
>>>> STACK: fetch_fastq_test.pl:11
>>>>
>>>> The only other report of this behavior I could find is here:
>>>> http://permalink.gmane.org/gmane.comp.lang.perl.bio.general/17836
>>>>
>>>> I get the same behavior when I use my own code and sequence. I hope I provided enough information. Sadly, I'm not sure what I'm doing wrong here.
>>>>
>>>> --Caleb
>>>
>>> Caleb,
>>>
>>> Make sure you are using the latest BioPerl release via CPAN, or via github; the line number and error message don't correspond to the latest version. If the problem persists, you may need to file a bug report for this with some example data and a script, or at least show some example data that is triggering the problem.
>>>
>>> I believe the current indexing scheme used for FASTQ isn't up-to-date with the current parser (which underwent a complete refactoring a while back), so this would help tremendously, but it should be fairly easy to add proper indexing to this. Jason and I briefly talked about FASTQ parsing a few months back in relation to speed of parsing, it could be much faster (my main concern initially was that it was correct).
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list