[Bioperl-l] About FASTQ parser

Abhishek Pratap abhishek.vit at gmail.com
Thu Sep 17 18:16:33 UTC 2009


Hi Chris

I am just wondering if the following is intentionally excluded from a
fasta record or a bug.

After reading in each fastq record from a FASTQ fiel the output of the
same recored  (  $out->write_seq($seq)  )  has line/text missing after
the + sign.



Eg:

@HWI-EAS397:1:1:11:252#NNNTNN/1
NACAATATCAATTAGAGGATTGCTTNGTTNAAGGNNTNGNTNNNANTNT
+
DNXPMXNYXMPVXZVTXYZ[[BBBBBBBBBBBBBBBBBBBBBBBBBBBB


PS: In our case we need the exact record to be printed out as we need
to split the fastq file into multiple fastq files based on the read
index in the @ Line. So exact output is needed to avoid conflicts with
downstream processing pipelines.

Thanks,
-Abhi

Thanks,
-Abhi

On Thu, Sep 17, 2009 at 12:39 AM, Chris Fields <cjfields at illinois.edu> wrote:
> Abhi,
>
> The FASTQ parser hasn't been released to CPAN yet.  It is available via
> bioperl-live.  We haven't added any code yet to the HOWTO's, but the
> SYNOPSIS example in Bio::SeqIO::fastq should be sufficient to get you
> started.
>
> Bio::Seq::Quality is the object returned via next_seq(); it can be queried
> for PHRED qual scores and other bits.  If you want to split things up you
> should call next_seq(), then generate a FASTQ output stream in the variant
> you want:
>
> my $outfasta = Bio::SeqIO->new(-format => 'fastq-sanger', -file =>
> '>fasta.file');
> my $outqual = Bio::SeqIO->new(-format => 'fastq-sanger', -file =>
> '>qual.file');
>
> while (my $seq = $in->next_seq) {
>   $outfasta->write_fasta($seq);
>   $outqual->write_qual($seq);
> }
>
> Note I haven't tested that yet, but it should work.  Let me know if it
> doesn't.
>
> chris
>
> On Sep 16, 2009, at 3:13 PM, Abhishek Pratap wrote:
>
>> Hi Chris
>>
>> I remember seeing a recent email about new bioperl fastq parser. Is it
>> part of bioperl 1.6 dist. I installed one and based on the doc
>>
>> here(http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/fastq.html)
>> I am a bit lost.
>>
>> I see two methods there : using Bio::SeqIO::fastq and
>> Bio::Seq::Quality. Are both same in terms of data returned and latter
>> giving a scale up in speed ?
>>
>> This is not to offend any developer but small example/s on the HOWTO's
>> helps a lot.
>>
>> The current example (copied below) is not working. I guess it is based
>> on a previous version of code.
>>
>> # grabs the FASTQ parser, specifies the Illumina variant
>> my $in = Bio::SeqIO->new(-format    => 'fastq-illumina',
>>                         -file      => 'mydata.fq');
>>
>>
>> My basic requirement is to read each read in fastq record and split it
>> into header: read: quality.
>>
>>
>> Thanks,
>> -Abhi
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list