[Bioperl-l] fastq splitter

Wed Feb 29 02:40:26 UTC 2012

That should work.  Can you send the output of 'perldoc Bio::SeqIO::fastq'?  That should indicate what is being called.

chris

On Feb 28, 2012, at 5:50 PM, Sean O'Keeffe wrote:

> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.006001
> 
> Isn't that 1.6.1 - does it need upgrading ?
> 
> On 28 February 2012 18:36, Sean O'Keeffe <limericksean at gmail.com> wrote:
> Could be. I'll check. 
> 
> On 28 February 2012 17:17, Fields, Christopher J <cjfields at illinois.edu> wrote:
> That's a bit odd.  Are you using an old version of the FASTQ parser?  It was revised a while ago, prior to the v1.6.1 release (the error matches one in the older parser)
> 
> chris
> 
> On Feb 28, 2012, at 4:01 PM, Sean O'Keeffe wrote:
> 
> > Hi Chris,
> > Unfortunately the read pairs are not consecutive. It seems they are cat'd together.
> > I could use split -l on the line number that they're glued together I guess.
> > If this is an overnight job for a bunch of files, I can wait so don't mind using the module if it worked.
> >
> > Someone pointed out I need to switch $seqin->desc to $inseq->desc.
> > However, now it spits out fasta output instead of fastq and returns a bunch of warnings: Seq/Qual descriptions don't match; using sequence description
> >
> > Hmm.
> >
> > On 28 February 2012 16:50, Fields, Christopher J <cjfields at illinois.edu> wrote:
> > Sean,
> >
> > If you trust the data enough, in that:
> >
> > 1) each record is 4 lines,
> > 2) mate pairs are consecutive in the file, and
> > 3) that read 1 always preceeds read 2 in the pair,
> >
> > then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch.  You can always run a check on the header with a regex if you don't trust it completely.
> >
> > Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment).
> >
> > chris
> >
> > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote:
> >
> > > Hi,
> > > I'm trying to write a quick script to separate one large PE fastq file into
> > > 2 separate files, one for each mate pair
> > >
> > > The file is of the format (mate1)
> > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> > > +
> > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
> > >
> > > && (mate2)
> > >
> > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> > > +
> > > ##################################################
> > >
> > >
> > > My idea is to separate using a regex such that / 1:/ would be the first
> > > mate pair and / 2:/ would go in the second mate file.
> > > I implemented the code below but each output file is empty. Can someone
> > > spot my error?
> > >
> > > Thanks,
> > > Sean.
> > >
> > > my $infile   = shift;
> > > my $outfile1 = $infile."_1";
> > > my $outfile2 = $infile."_2";
> > >
> > > my $seqin = Bio::SeqIO->new(
> > >                             -file   => "<$infile",
> > >                             -format => "fastq",
> > >                             );
> > > my $seqout1 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile1",
> > >                              -format => "fastq",
> > >                              );
> > >
> > > my $seqout2 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile2",
> > >                              -format => "fastq",
> > >                              );
> > > while (my $inseq = $seqin->next_seq) {
> > >    if ($seqin->desc =~ / 1:/){
> > >      $seqout1->write_seq($inseq);
> > >    } else {
> > >      $seqout2->write_seq($inseq);
> > >    }
> > > }
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> 
>