Bioperl: manipulating long strings (genomes) in PERL

Andrew Dalke dalke@bioreason.com
Mon, 29 Mar 1999 10:06:52 -0800


Dawn Field <dfield@molbiol.ox.ac.uk> said:
> I haven't had it explained why large strings slow down run time
> so much.   Can someone explain to me exactly why this is and
> more importantly, how best to deal with long strings in PERL

Could you describe where you see this slowdown?  I did a timing
test almost two year agos on perl4 and perl5 comparing the prosite
patterns to protein-like strings of increasing length.  As I recall,
perl5.004 was slower than perl4 and, at that time, had had run-times
with slower than linear growth in the string size.

The details of the discussion start at:
http://x8.dejanews.com/[ST_rn=ps]/getdoc.xp?AN=264788436&CONTEXT=922729983.479133779&hitnum=7

Ahh, I showed that perl5 regex searches are between 1.4 to 5.6 times
slower than the exact same script in perl4.  In the worst case (for
a very long string), perl4 took about 6 minutes while perl5 took a
bit over half an hour, for very long strings.

The biggest problem was the use of character classes in perl5
http://x8.dejanews.com/[ST_rn=ps]/getdoc.xp?AN=265136360&CONTEXT=922729983.479133779&hitnum=1
and Ilya Zakharevich fixed that problem in more recent perls, but
I've never tested his changes.

The conclusion of this reponse is, without more information on what
you mean by "slow down", it's very hard to answer your question.

						Andrew Dalke
						dalke@bioreason.com
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================