[Bioperl-l] fastq splitter
Fields, Christopher J
cjfields at illinois.edu
Tue Feb 28 21:50:30 UTC 2012
Sean,
If you trust the data enough, in that:
1) each record is 4 lines,
2) mate pairs are consecutive in the file, and
3) that read 1 always preceeds read 2 in the pair,
then I would simply iterate through 4 lines at a time and dump to the two separate files, maybe using a flip-flop or simple record count and modulus switch. You can always run a check on the header with a regex if you don't trust it completely.
Just from the sanity point-of-view, unless you're doing a lot of validation I wouldn't use Bio::SeqIO::fastq, unless you have some time on your hands and a relatively low number of seqs (it's notoriously slow at the moment).
chris
On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote:
> Hi,
> I'm trying to write a quick script to separate one large PE fastq file into
> 2 separate files, one for each mate pair
>
> The file is of the format (mate1)
> @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> +
> BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
>
> && (mate2)
>
> @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> +
> ##################################################
>
>
> My idea is to separate using a regex such that / 1:/ would be the first
> mate pair and / 2:/ would go in the second mate file.
> I implemented the code below but each output file is empty. Can someone
> spot my error?
>
> Thanks,
> Sean.
>
> my $infile = shift;
> my $outfile1 = $infile."_1";
> my $outfile2 = $infile."_2";
>
> my $seqin = Bio::SeqIO->new(
> -file => "<$infile",
> -format => "fastq",
> );
> my $seqout1 = Bio::SeqIO->new(
> -file => ">$outfile1",
> -format => "fastq",
> );
>
> my $seqout2 = Bio::SeqIO->new(
> -file => ">$outfile2",
> -format => "fastq",
> );
> while (my $inseq = $seqin->next_seq) {
> if ($seqin->desc =~ / 1:/){
> $seqout1->write_seq($inseq);
> } else {
> $seqout2->write_seq($inseq);
> }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list