dmandel has asked for the wisdom of the Perl Monks concerning the following question:
Hello all.
I've encountered serious regex performance degradation after upgrading to Perl 5.8. I've done much searching over the web and recompiled with various options but not found a lot of answers.
Here is sample output from both versions of Perl and time examples that illustrate the degradation:
[dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m3.420s
user 0m2.590s
sys 0m0.000s
[dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100;
+ for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m0.151s
user 0m0.050s
sys 0m0.000s
[dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m2.584s
user 0m2.580s
sys 0m0.010s
[dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100;
+ for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m0.108s
user 0m0.050s
sys 0m0.000s
[dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m2.718s
user 0m2.570s
sys 0m0.010s
[dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100;
+ for
(1..100){$x =~ s/(.*?)I/$1/isge;}'
real 0m0.047s
user 0m0.050s
sys 0m0.000s
Clearly that particular regex isn't doing anything useful, but it ended up being the portion of another useful regex that was slowing things down. Here is the output of running the two Perls with the '-V' option if that might help explain things:
5.8.2:
[dmandel@midgard dmandel]# perl -V
Summary of my perl5 (revision 5.0 version 8 subversion 2) configuratio
+n:
Platform:
osname=linux, osvers=2.4.20-28.9, archname=i686-linux-thread-multi
uname='linux midgard 2.4.20-28.9 #1 thu dec 18 13:45:22 est 2003 i
+686 i686 i386 gnulinux '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemulti
+plicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
+-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE
+_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O3',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-stri
+ct-aliasing -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)',
+ gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1
+2
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
+ lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.
+a
gnulibc_version='2.3.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami
+c'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL
+_IMPLICIT_CONTEXT
Built under linux
Compiled at Jan 8 2004 21:52:16
@INC:
/usr/local/lib/perl5/5.8.2/i686-linux-thread-multi
/usr/local/lib/perl5/5.8.2
/usr/local/lib/perl5/site_perl/5.8.2/i686-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.8.2
/usr/local/lib/perl5/site_perl
and 5.6.1:
[dmandel@midgard dmandel]# perl5.6.1 -V
Summary of my perl5 (revision 5.0 version 6 subversion 1) configuratio
+n:
Platform:
osname=linux, osvers=2.4.21-1.1931.2.393.entsmp, archname=i386-lin
+ux
uname='linux bugs.devel.redhat.com 2.4.21-1.1931.2.393.entsmp #1 s
+mp thu aug 14 14:47:21 edt 2003 i686 unknown '
config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc -
+Dcf_by=Red Hat, Inc. -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix
+=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Uu
+sethreads -Uuseithreads -Uuselargefiles -Dd_dosuid -Dd_semctl_semun -
+Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Dinc_ver
+sion_list=5.6.0/i386-linux 5.6.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultipl
+icity=undef
useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
Compiler:
cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include',
optimize='-O2 -march=i386 -mcpu=i686',
cppflags='-fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.3 2.96-11
+3)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1
+2
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
+ lseeksize=4
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldl -lm -lc -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil
libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.
+a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami
+c'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Compile-time options:
Built under linux
Compiled at Aug 18 2003 16:08:31
@INC:
/usr/lib/perl5/5.6.1/i386-linux
/usr/lib/perl5/5.6.1
/usr/lib/perl5/site_perl/5.6.1/i386-linux
/usr/lib/perl5/site_perl/5.6.1
/usr/lib/perl5/site_perl/5.6.0/i386-linux
/usr/lib/perl5/site_perl/5.6.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.6.1/i386-linux
/usr/lib/perl5/vendor_perl/5.6.1
/usr/lib/perl5/vendor_perl
Thank you for your time.
Sincerely,
Danny Mandel
Edited by Chady -- added readmore tag.
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by hardburn (Abbot) on Jan 20, 2004 at 16:41 UTC
|
According to your -V output on your 5.8 perl, you're running a threaded version. That slows everything down, even in programs that don't use threads. Your only options are to recompile perl without threads, get a precompiled version from your vendor without threads, or live with it.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [d/l] [select] |
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by Aragorn (Curate) on Jan 20, 2004 at 19:13 UTC
|
In addition to what hardburn said, I'd like to add that benchmarking is better (more accurately) done with the Benchmark module, instead of the time command.
Arjen | [reply] [d/l] |
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by Anonymous Monk on Jan 20, 2004 at 19:40 UTC
|
You're not benchmarking regex performance.
You're benchmarking the time it takes for perl to start up and regex performance.
100 iterations is too few for an accurate count.
| [reply] |
|
| [reply] |
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by ysth (Canon) on Jan 20, 2004 at 19:19 UTC
|
IMO the time differences you are seeing can't be explained by slowdown due to threading. Using a shorter test string "aibicid", I do see some differences in the output generated by use re "debug"; between 5.6.x and 5.8.x, but don't know enough to interpret the output.I'd
encourage you to report this as a bug. Even if (as I expect will be the case) you are then told that the slowdown is due to necessary bug fixes, someone may come up with a better way. | [reply] [d/l] |
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by mce (Curate) on Jan 21, 2004 at 13:05 UTC
|
Hi All,
I did a quick benchmark test on his code using non
threaded 5.8.0 and 5.6.1 and indeed there is
a serious performance degradation.
It is the combination of the i and the g regex modifiers
that causes the performance issue. If you remove one of them
the performance is the sames as in 5.6.1.
I did't not dig deeper into this, but there must be an explanation for this.
ps:
this is the code I used
#!/usr/local/bin/perl5.8.0 or #!/usr/local/bin/perl5.6.1
use Benchmark;
$main::x=join("",(a..z))x100;
&timethis(100,\&test);
sub test {
$main::x =~ s/(.*?)I/$1/isge;
}
update On popular demand, here are the benchmark results.
Perl 5.8.0
timethis 1000: 48 wallclock secs (46.76 usr + 0.02 sys = 46.78 CPU) @
+ 21.38/s (n=1000)
Perl 5.6.1
timethis 1000: 1 wallclock secs ( 0.70 usr + 0.00 sys = 0.70 CPU) @
+ 1428.57/s (n=1000)
When debugging with use re "debug", there are some differences (I only post the last few lines)
In 5.8.0, it gives
Setting an EVAL scope, savestack=12
24 <opqrstuvwx> <yz> | 1: OPEN1
24 <opqrstuvwx> <yz> | 3: MINMOD
24 <opqrstuvwx> <yz> | 4: STAR
Setting an EVAL scope, savestack=12
failed...
Setting an EVAL scope, savestack=12
25 <opqrstuvwxy> <z> | 1: OPEN1
25 <opqrstuvwxy> <z> | 3: MINMOD
25 <opqrstuvwxy> <z> | 4: STAR
Setting an EVAL scope, savestack=12
failed...
Setting an EVAL scope, savestack=12
26 <opqrstuvwxyz> <> | 1: OPEN1
26 <opqrstuvwxyz> <> | 3: MINMOD
26 <opqrstuvwxyz> <> | 4: STAR
Setting an EVAL scope, savestack=12
failed...
Match failed
whilst in 5.6.1, it returns
SANY can match 1 times out of 1...
23 <opqrstuvw> <xyz> | 6: CLOSE1
23 <opqrstuvw> <xyz> | 8: EXACTF <I>
failed...
SANY can match 1 times out of 1...
24 <opqrstuvwx> <yz> | 6: CLOSE1
24 <opqrstuvwx> <yz> | 8: EXACTF <I>
failed...
SANY can match 1 times out of 1...
25 <opqrstuvwxy> <z> | 6: CLOSE1
25 <opqrstuvwxy> <z> | 8: EXACTF <I>
failed...
SANY can match 1 times out of 1...
26 <opqrstuvwxyz> <> | 6: CLOSE1
26 <opqrstuvwxyz> <> | 8: EXACTF <I>
failed...
SANY can match 0 times out of 1...
failed...
Match failed
But, since I am not a regex expert, I have no clue what this means :-)
---------------------------
Dr. Mark Ceulemans
Senior Consultant
BMC, Belgium
| [reply] [d/l] [select] |
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by gmpassos (Priest) on Jan 20, 2004 at 19:06 UTC
|
I was looking the different times, and saw that Perl-5.8x is about 4 times more slow than Perl-5.6x.
I can be wrong, but on Perl-5.8 UTF-8 will make the strings to alocate 4 bytes for each character. And REGEXP when looking in the string will need to handle that too.
From POD, perlunicode:
UTF-8 is a variable-length (1 to 6 bytes, current character allocation
+s require 4 bytes)...
And from bytes:
As an example, when Perl sees $x = chr(400), it encodes the character
+in UTF-8 and stores it in $x. Then it is marked as character data, so
+, for instance, length $x returns 1. However, in the scope of the byt
+es pragma, $x is treated as a series of bytes - the bytes that make u
+p the UTF8 encoding - and length $x returns 2:
Soo, this code:
$x = chr(400);
print 'Length: ', length $x, qq~\n~;
{
use bytes;
print 'Length (bytes): ', length $x, qq~\n~;
}
Has the output:
Length: 1
Length (bytes): 2
Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow, make the same test, but adding a string bigger and compare with the tests of this node.
But note that the REGEXP machine in Perl-5.8x is much more complex than in Perl-5.6x just to need to handle the different encode formats that Perl handles. Maybe you need to look for some pragma that disable UTF-8 handling on REGEXP (that I haven't found), and not to try to recompile Perl.
Graciliano M. P.
"Creativity is the expression of the liberty".
| [reply] [d/l] [select] |
|
Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow
What a load of crap - This has to be seriously one of the worst answers that I have seen posted on this site, dressed up as determinate rationale ...
The OP would be better to following the avenues of investigation offered by other posters, namely, increasing test sample size, employing a better test framework (Benchmark), considering the difference between threaded and unthreaded versions of Perl - the performance difference between threaded and unthreaded versions of Perl can be quite significant, even where threads is not employed - and following up with perl5-porters.
| [reply] |
|
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
|