[Bioperl-l] Call to users/developers -- user cases that bring Bioperl to its knees
Charles Plessy
charles-listes+bioperl at plessy.org
Wed Feb 25 06:11:57 UTC 2009
Le Mon, Feb 23, 2009 at 04:06:21PM +0000, Albert Vilella a écrit :
>
> Can interested users/developers provide a URL with a dataset that
> brings bioperl to its knees in
> terms of CPU usage for say, about 1h?
Dear Albert,
I do not know if it fits your requirements, but I found that bp_seqconvert or
bp_sreformat are not fast enough to be used efficiently with million of
sequences in fastq format.
You can download an example file here (that I have chosen randomly):
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002479/ERR002479_2.fastq.gz
This file will give an error as the sequence name is not duplicated in the
quality header, but I compared with a local file that does not have this
problem, and confirmed that the error is not the slowing factor.
(Unfortunately, I could not find public fastq files in which the file name is
given in both the sequence and quality header, probably because it makes the
file heavier).
Have a nice day,
--
Charles Plessy
Tsurumi, Kanagawa, Japan
More information about the Bioperl-l
mailing list