[Bioperl-l] randomizing fastq sequences
Chris Fields
cjfields at illinois.edu
Tue Feb 8 15:53:27 UTC 2011
Just to note, I have been thinking about wrapping this for fast indexing and retrieval of FASTQ for bioperl (this came up in a prior thread, with the same suggestion from Malcolm IIRC).
chris
On Feb 8, 2011, at 9:12 AM, Cook, Malcolm wrote:
> Gotta chime in....
>
> If
> you're working with fastq files
> are working in unix and have the `shuf` command available
>
> I recommand you to install cdbyank http://sourceforge.net/projects/cdbfasta/ which provides for indexing fasta and fastq files and providing random access to them
>
> Index the fastq, then extract the IDs with cdyank, pipe them through `shuf` and then through cdyank again to pull out the sequences.
>
> Like this example, which uses a test fastq from my local install of bioperl:
>
>> cd ~/local/src/bioperl-live/t/data/fastq/
>> cdbfasta -Q example.fastq
> 3 entries from file example.fastq were indexed in file example.fastq.cidx
>> cdbyank -l example.fastq.cidx | shuf | cdbyank example.fastq.cidx > shuf_example.fastq
>
> There would be issues if your IDs are not unique.
>
> Malcolm Cook
> Stowers Institute for Medical Research - Bioinformatics
> Kansas City, Missouri USA
>
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> shalu sharma
>> Sent: Monday, February 07, 2011 4:08 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] randomizing fastq sequences
>>
>> Hi,
>> i am trying to test one program for which i need to change
>> order of sequences in a fastq file.
>> My fastq file contains about 50,000 sequences.
>> Is there any script that can do this task?
>>
>> Thanks
>> Shalu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list