Improve processing time for string substitutions

valavanp has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Improve processing time for string substitutions by Corion (Patriarch) on Apr 16, 2007 at 13:54 UTC
Like I already told you in the CB: `my $re = join "\|", map { quotemeta $_ } keys %entitylist; $re = qr/(?:$re)/; # ... $string =~ s/($re)/$entitylist{ $1 }/ge;` [download] You might want to take a look at perlre for the `/e` switch.	[reply] [d/l] [select]
Re^2: Improve processing time for string substitutions by Anno (Deacon) on Apr 16, 2007 at 15:46 UTC
Is there a reason for the non-capturing parens in "`$re = qr/(?:$re)/;`"? What good does the /e switch do in the regex "`s/($re)/$entitylist{ $1 }/ge;`"? Anno	[reply] [d/l] [select]
Re^3: Improve processing time for string substitutions by ikegami (Patriarch) on Apr 16, 2007 at 15:56 UTC
No (`qr//` already adds non-capturing parens) and none (interpolation is done after each match).	[reply] [d/l]
Re^3: Improve processing time for string substitutions by Corion (Patriarch) on Apr 16, 2007 at 15:56 UTC
The `/e` switch is an error by me - it is a leftover from when I thought about doing the hex-conversion manually with a `sprintf` call in the right hand side.. I always build my regular expressions with noncapturing parentheses when I pal on latter assembling them - this prevents embarassing bug hunts later when I change how the target RE is built, possibly by repeating one ("atomic") building block - leaving out the parentheses causes hard-to-track misbehaviour with input on the seam of the two blocks. .	[reply] [d/l] [select]
Re^4: Improve processing time for string substitutions by Anno (Deacon) on Apr 16, 2007 at 16:06 UTC
Re: Improve processing time for string substitutions by kyle (Abbot) on Apr 16, 2007 at 13:47 UTC
Off the top of my head: `my $entity = join '\|', keys %entitylist; $string =~ s/($entity)/$entitylist{$1}/g;` [download] This way you only scan the whole string once instead of once for each entity. Update: I've gotten a private message that this needs a `/e` switch to work. I submit that this is not the case. `my %entitylist = ( a => 1, b => 2); my $string = 'abc'; my $entity = join '\|', keys %entitylist; $string =~ s/($entity)/$entitylist{$1}/g; print $string, "\n"; __END__ 12c` [download] (I actually tested this before my original post, but I only pasted in the relevant portion.)	[reply] [d/l] [select]
Re: Improve processing time for string substitutions by liverpole (Monsignor) on Apr 16, 2007 at 13:47 UTC
Hi valvanp, Do the contents of the file have to be in one `$string`? If you can read the lines of the file into an array instead, it should take less processing time, as the regex won't have to scan (and modify) one huge, single line each time. s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply] [d/l]
Re: Improve processing time for string substitutions by Krambambuli (Curate) on Apr 16, 2007 at 14:35 UTC
I'm having difficulties in understanding the exact form of the entities - is it just my browser/setup or is something wrong with the formatting ? I'd be anxious to know what Benchmark would show about the various speedup suggestions. If it would be true that the entities content might be separated by successively splitting on '>' and then on '<', replacing the content from the hash and re-joining the modified parts might even beat the regexp-based approach. Curious.	[reply]
Re^2: Improve processing time for string substitutions by jhourcle (Prior) on Apr 16, 2007 at 17:40 UTC
It's unescaped HTML entities: `I want to convert the entities like &ge, &le, into hexa values &#x2265, &#x2264.` But, I'll also point out that the actual entity should end in a semicolon. (which prevents issues such as '`&or;`' matching '`ª`')	[reply] [d/l] [select]
Re^3: Improve processing time for string substitutions by Krambambuli (Curate) on Apr 16, 2007 at 21:16 UTC
Well then, it's something like below that might fit. Read more... (2 kB) The results are interesting: `Benchmark: timing 1000 iterations of regexish, with_splitting... regexish: 59 wallclock secs (58.16 usr + 0.02 sys = 58.18 CPU) @ 17 +.19/s (n=1000) with_splitting: 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) +@ 50000.00/s (n=1000) (warning: too few iterations for a reliable count)` [download]	[reply] [d/l] [select]


Perl Monk, Perl Meditation
	PerlMonks