[Bioperl-l] fastq splitter

Wed Feb 29 18:11:39 UTC 2012

This was an interesting thread to follow (I'm about to dive into Illimina
data). Glad you found the cause of the problem, Sean.

FYI - you may already know this trick, but when I work on a cluster, my
first command in my submission script is to always source my bash profile
(.profile, .bashrc, etc. depending on your setup).

This way you can control the structure of the PERL5LIB variable (among
others) on the slave nodes and ensure your local perl modules are
preferentially called.

Of course there are other solutions to this problem too.

Best,
Tom
On Feb 29, 2012 9:38 AM, "Sean O&apos;Keeffe" <limericksean at gmail.com>
wrote:

> Yes. I ran my script on a cluster which may have had bioperl installed, not
> sure.
> Running it locally = success.
>
> Thanks all!
>
>
>
> On 29 February 2012 12:13, Fields, Christopher J <cjfields at illinois.edu
> >wrote:
>
> > Sean,
> >
> > To follow up just in case it was a bug, tested with your seq examples and
> > they also work, so my guess is something else is wrong locally.
> >
> > [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq
> > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> > +
> > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
> > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> > +
> > ##################################################
> >
> > chris
> >
> > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote:
> >
> > > Hi,
> > > I'm trying to write a quick script to separate one large PE fastq file
> > into
> > > 2 separate files, one for each mate pair
> > >
> > > The file is of the format (mate1)
> > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> > > +
> > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
> > >
> > > && (mate2)
> > >
> > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> > > +
> > > ##################################################
> > >
> > >
> > > My idea is to separate using a regex such that / 1:/ would be the first
> > > mate pair and / 2:/ would go in the second mate file.
> > > I implemented the code below but each output file is empty. Can someone
> > > spot my error?
> > >
> > > Thanks,
> > > Sean.
> > >
> > > my $infile   = shift;
> > > my $outfile1 = $infile."_1";
> > > my $outfile2 = $infile."_2";
> > >
> > > my $seqin = Bio::SeqIO->new(
> > >                             -file   => "<$infile",
> > >                             -format => "fastq",
> > >                             );
> > > my $seqout1 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile1",
> > >                              -format => "fastq",
> > >                              );
> > >
> > > my $seqout2 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile2",
> > >                              -format => "fastq",
> > >                              );
> > > while (my $inseq = $seqin->next_seq) {
> > >    if ($seqin->desc =~ / 1:/){
> > >      $seqout1->write_seq($inseq);
> > >    } else {
> > >      $seqout2->write_seq($inseq);
> > >    }
> > > }
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>