[Bioperl-l] fastq splitter

Sean O'Keeffe limericksean at gmail.com
Tue Feb 28 21:11:13 UTC 2012


Hi,
I'm trying to write a quick script to separate one large PE fastq file into
2 separate files, one for each mate pair

The file is of the format (mate1)
@HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
+
BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA

&& (mate2)

@HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
+
##################################################


My idea is to separate using a regex such that / 1:/ would be the first
mate pair and / 2:/ would go in the second mate file.
I implemented the code below but each output file is empty. Can someone
spot my error?

Thanks,
Sean.

my $infile   = shift;
my $outfile1 = $infile."_1";
my $outfile2 = $infile."_2";

my $seqin = Bio::SeqIO->new(
                             -file   => "<$infile",
                             -format => "fastq",
                             );
my $seqout1 = Bio::SeqIO->new(
                              -file   => ">$outfile1",
                              -format => "fastq",
                              );

my $seqout2 = Bio::SeqIO->new(
                              -file   => ">$outfile2",
                              -format => "fastq",
                              );
while (my $inseq = $seqin->next_seq) {
    if ($seqin->desc =~ / 1:/){
      $seqout1->write_seq($inseq);
    } else {
      $seqout2->write_seq($inseq);
    }
}



More information about the Bioperl-l mailing list