Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

serious regex performance degradation after upgrade to perl 5.8 from 5.6

by dmandel (Novice)
on Jan 20, 2004 at 16:34 UTC ( [id://322619]=perlquestion: print w/replies, xml ) Need Help??

dmandel has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. I've encountered serious regex performance degradation after upgrading to Perl 5.8. I've done much searching over the web and recompiled with various options but not found a lot of answers. Here is sample output from both versions of Perl and time examples that illustrate the degradation:
[dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m3.420s user 0m2.590s sys 0m0.000s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.151s user 0m0.050s sys 0m0.000s [dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m2.584s user 0m2.580s sys 0m0.010s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.108s user 0m0.050s sys 0m0.000s [dmandel@midgard dmandel]# time perl -e '$x=join("",(a..z))x100; for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m2.718s user 0m2.570s sys 0m0.010s [dmandel@midgard dmandel]# time perl5.6.1 -e '$x=join("",(a..z))x100; + for (1..100){$x =~ s/(.*?)I/$1/isge;}' real 0m0.047s user 0m0.050s sys 0m0.000s
Clearly that particular regex isn't doing anything useful, but it ended up being the portion of another useful regex that was slowing things down. Here is the output of running the two Perls with the '-V' option if that might help explain things:
5.8.2:
[dmandel@midgard dmandel]# perl -V Summary of my perl5 (revision 5.0 version 8 subversion 2) configuratio +n: Platform: osname=linux, osvers=2.4.20-28.9, archname=i686-linux-thread-multi uname='linux midgard 2.4.20-28.9 #1 thu dec 18 13:45:22 est 2003 i +686 i686 i386 gnulinux ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS +-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE +_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O3', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-stri +ct-aliasing -I/usr/local/include -I/usr/include/gdbm' ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', + gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl. +a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami +c' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL +_IMPLICIT_CONTEXT Built under linux Compiled at Jan 8 2004 21:52:16 @INC: /usr/local/lib/perl5/5.8.2/i686-linux-thread-multi /usr/local/lib/perl5/5.8.2 /usr/local/lib/perl5/site_perl/5.8.2/i686-linux-thread-multi /usr/local/lib/perl5/site_perl/5.8.2 /usr/local/lib/perl5/site_perl
and 5.6.1:
[dmandel@midgard dmandel]# perl5.6.1 -V Summary of my perl5 (revision 5.0 version 6 subversion 1) configuratio +n: Platform: osname=linux, osvers=2.4.21-1.1931.2.393.entsmp, archname=i386-lin +ux uname='linux bugs.devel.redhat.com 2.4.21-1.1931.2.393.entsmp #1 s +mp thu aug 14 14:47:21 edt 2003 i686 unknown ' config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc - +Dcf_by=Red Hat, Inc. -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix +=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Uu +sethreads -Uuseithreads -Uuselargefiles -Dd_dosuid -Dd_semctl_semun - +Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Dinc_ver +sion_list=5.6.0/i386-linux 5.6.0' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='gcc', ccflags ='-fno-strict-aliasing -I/usr/local/include', optimize='-O2 -march=i386 -mcpu=i686', cppflags='-fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.3 2.96-11 +3)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=4 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl. +a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynami +c' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: Built under linux Compiled at Aug 18 2003 16:08:31 @INC: /usr/lib/perl5/5.6.1/i386-linux /usr/lib/perl5/5.6.1 /usr/lib/perl5/site_perl/5.6.1/i386-linux /usr/lib/perl5/site_perl/5.6.1 /usr/lib/perl5/site_perl/5.6.0/i386-linux /usr/lib/perl5/site_perl/5.6.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.6.1/i386-linux /usr/lib/perl5/vendor_perl/5.6.1 /usr/lib/perl5/vendor_perl
Thank you for your time. Sincerely, Danny Mandel

Edited by Chady -- added readmore tag.

Replies are listed 'Best First'.
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by hardburn (Abbot) on Jan 20, 2004 at 16:41 UTC

    According to your -V output on your 5.8 perl, you're running a threaded version. That slows everything down, even in programs that don't use threads. Your only options are to recompile perl without threads, get a precompiled version from your vendor without threads, or live with it.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by Aragorn (Curate) on Jan 20, 2004 at 19:13 UTC
    In addition to what hardburn said, I'd like to add that benchmarking is better (more accurately) done with the Benchmark module, instead of the time command.

    Arjen

Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by Anonymous Monk on Jan 20, 2004 at 19:40 UTC
    You're not benchmarking regex performance. You're benchmarking the time it takes for perl to start up and regex performance. 100 iterations is too few for an accurate count.
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by ysth (Canon) on Jan 20, 2004 at 19:19 UTC
    IMO the time differences you are seeing can't be explained by slowdown due to threading. Using a shorter test string "aibicid", I do see some differences in the output generated by use re "debug"; between 5.6.x and 5.8.x, but don't know enough to interpret the output.

    I'd encourage you to report this as a bug. Even if (as I expect will be the case) you are then told that the slowdown is due to necessary bug fixes, someone may come up with a better way.

Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by mce (Curate) on Jan 21, 2004 at 13:05 UTC
    Hi All,
    I did a quick benchmark test on his code using non threaded 5.8.0 and 5.6.1 and indeed there is a serious performance degradation.

    It is the combination of the i and the g regex modifiers that causes the performance issue. If you remove one of them the performance is the sames as in 5.6.1. I did't not dig deeper into this, but there must be an explanation for this.

    ps: this is the code I used

    #!/usr/local/bin/perl5.8.0 or #!/usr/local/bin/perl5.6.1 use Benchmark; $main::x=join("",(a..z))x100; &timethis(100,\&test); sub test { $main::x =~ s/(.*?)I/$1/isge; }
    update On popular demand, here are the benchmark results.
    Perl 5.8.0 timethis 1000: 48 wallclock secs (46.76 usr + 0.02 sys = 46.78 CPU) @ + 21.38/s (n=1000) Perl 5.6.1 timethis 1000: 1 wallclock secs ( 0.70 usr + 0.00 sys = 0.70 CPU) @ + 1428.57/s (n=1000)
    When debugging with use re "debug", there are some differences (I only post the last few lines) In 5.8.0, it gives
    Setting an EVAL scope, savestack=12 24 <opqrstuvwx> <yz> | 1: OPEN1 24 <opqrstuvwx> <yz> | 3: MINMOD 24 <opqrstuvwx> <yz> | 4: STAR Setting an EVAL scope, savestack=12 failed... Setting an EVAL scope, savestack=12 25 <opqrstuvwxy> <z> | 1: OPEN1 25 <opqrstuvwxy> <z> | 3: MINMOD 25 <opqrstuvwxy> <z> | 4: STAR Setting an EVAL scope, savestack=12 failed... Setting an EVAL scope, savestack=12 26 <opqrstuvwxyz> <> | 1: OPEN1 26 <opqrstuvwxyz> <> | 3: MINMOD 26 <opqrstuvwxyz> <> | 4: STAR Setting an EVAL scope, savestack=12 failed... Match failed
    whilst in 5.6.1, it returns
    SANY can match 1 times out of 1... 23 <opqrstuvw> <xyz> | 6: CLOSE1 23 <opqrstuvw> <xyz> | 8: EXACTF <I> failed... SANY can match 1 times out of 1... 24 <opqrstuvwx> <yz> | 6: CLOSE1 24 <opqrstuvwx> <yz> | 8: EXACTF <I> failed... SANY can match 1 times out of 1... 25 <opqrstuvwxy> <z> | 6: CLOSE1 25 <opqrstuvwxy> <z> | 8: EXACTF <I> failed... SANY can match 1 times out of 1... 26 <opqrstuvwxyz> <> | 6: CLOSE1 26 <opqrstuvwxyz> <> | 8: EXACTF <I> failed... SANY can match 0 times out of 1... failed... Match failed
    But, since I am not a regex expert, I have no clue what this means :-)
    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    BMC, Belgium
Re: serious regex performance degradation after upgrade to perl 5.8 from 5.6
by gmpassos (Priest) on Jan 20, 2004 at 19:06 UTC
    I was looking the different times, and saw that Perl-5.8x is about 4 times more slow than Perl-5.6x.

    I can be wrong, but on Perl-5.8 UTF-8 will make the strings to alocate 4 bytes for each character. And REGEXP when looking in the string will need to handle that too.

    From POD, perlunicode:

    UTF-8 is a variable-length (1 to 6 bytes, current character allocation +s require 4 bytes)...
    And from bytes:
    As an example, when Perl sees $x = chr(400), it encodes the character +in UTF-8 and stores it in $x. Then it is marked as character data, so +, for instance, length $x returns 1. However, in the scope of the byt +es pragma, $x is treated as a series of bytes - the bytes that make u +p the UTF8 encoding - and length $x returns 2:
    Soo, this code:
    $x = chr(400); print 'Length: ', length $x, qq~\n~; { use bytes; print 'Length (bytes): ', length $x, qq~\n~; }
    Has the output:
    Length: 1 Length (bytes): 2

    Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow, make the same test, but adding a string bigger and compare with the tests of this node.

    But note that the REGEXP machine in Perl-5.8x is much more complex than in Perl-5.6x just to need to handle the different encode formats that Perl handles. Maybe you need to look for some pragma that disable UTF-8 handling on REGEXP (that I haven't found), and not to try to recompile Perl.

    Graciliano M. P.
    "Creativity is the expression of the liberty".

      Soo, to see if just a string 4 times bigger can make the REGEXP 4 times slow

      What a load of crap - This has to be seriously one of the worst answers that I have seen posted on this site, dressed up as determinate rationale ...

      The OP would be better to following the avenues of investigation offered by other posters, namely, increasing test sample size, employing a better test framework (Benchmark), considering the difference between threaded and unthreaded versions of Perl - the performance difference between threaded and unthreaded versions of Perl can be quite significant, even where threads is not employed - and following up with perl5-porters.

        So gmpassos' post was incorrect -- your response however seems unnecessarily aggressive; particularly when posting anonymously care needs to be taken to avoid appearing abusive.

        The second sentence of gmpassos' message begins by saying that he might be wrong; he offered a suggestion, it might not have been right but there's no reason to believe his intention was anything other than to try to help.

        edit: fixed tpyo

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://322619]
Approved by sulfericacid
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-23 13:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found