Re^5: Speeding up named capture buffer access

Replies are listed 'Best First'.
Re^6: Speeding up named capture buffer access by SBECK (Chaplain) on Dec 01, 2009 at 17:30 UTC
I'm more interested in elegance (and maintainability) than in the optimization. I rewrote all of the regexps in Date::Manip to use named buffers, and I'm not interested in going back to numbered buffers. As an example, in one place in Date::Manip, I match a set of related regular expressions that match various date strings, and there are 23 different possibilities containing 65 different matches between them (NOT all in the same order), so manually counting all of the match positions, while doable, basically renders that code static and unmaintainable... a simple change to the regexps leads to a very tedious and error-prone piece of work to maintain it. I think that's the worse case... but there are a several other cases that are almost as bad. That said, I want as much optimization as I can, within that constraint, and that's the basis for my question.	[reply]
Re^7: Speeding up named capture buffer access by ikegami (Patriarch) on Dec 01, 2009 at 17:56 UTC
I don't know how it compares for speed — probably slower due to the sub calls — but here's an alternative. use strict; use warnings; use re 'eval'; # Should be scoped better. sub rc($) { my $ofs = @- + shift; return substr($_, $-[$ofs], $+[$ofs] - $-[$ofs]) } sub compile_pat { qr/$_[0]/ } my @s_months = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); my $s_months = compile_pat join '\|', @s_months; local our %s_months = map { $s_months[$_] => $_+1 } 0..$#s_months; my @pats = ( qr/ (\d{4})-(\d{2})-(\d{2}) (?{[ rc-3, rc-2, rc-1 ]})/x, qr/ (\d{2})($s_months)(\d{4}) (?{[ rc-1, $s_months{rc-2}, rc-3 ]})/x, ); my $pat = compile_pat join '\|', @pats; for (qw( 2009-12-01 01Dec2009 01-12-2009 )) { local our ($y,$m,$d); if (/$pat(?{ ($y,$m,$d) = @{$^R} })/) { printf("%s => %04d-%02d-%02d\n", $_,$y,$m,$d); } else { printf("%s => [No match]\n", $_); } } [download] Bonus: `$pat` can be calculated once and stored in a file.	[reply] [d/l] [select]
Re^8: Speeding up named capture buffer access by SBECK (Chaplain) on Dec 01, 2009 at 18:08 UTC
This will take a bit of work to put in, and I'm not sure what the performance will be like, but it's worth at least trying.	[reply]
Re^9: Speeding up named capture buffer access by ikegami (Patriarch) on Dec 01, 2009 at 18:18 UTC
Re^10: Speeding up named capture buffer access by SBECK (Chaplain) on Dec 01, 2009 at 19:29 UTC
Some notes below your chosen depth have not been shown here
Re^7: Speeding up named capture buffer access by JadeNB (Chaplain) on Dec 01, 2009 at 19:34 UTC
As an example, in one place in Date::Manip, I match a set of related regular expressions that match various date strings, and there are 23 different possibilities containing 65 different matches between them (NOT all in the same order) You've mentioned several times the need to work around the fact that you don't know which of many alternatives matched. Would it be possible, instead of `$string =~ /$re1\|$re2/ and ( $h, $m, $s ) = ...` [download] , to do `$string =~ $re1 and ( $h, $m, $s ) = ... or $string =~ $re2 and ( $h, +$m, $s ) = ...` [download] and just have to worry about the order for individual regexes (rather than trying to find one order that works for all regexes); or does that also fall afoul of the maintainability requirement? Note that this approach means that introducing one new regex involves one simple counting problem, rather than one big counting problem that could interefere with all the old counts.	[reply] [d/l] [select]
Re^8: Speeding up named capture buffer access by SBECK (Chaplain) on Dec 01, 2009 at 20:18 UTC
That's how I had it originally... and when you've got 23 different possibilities, it adds unnecessary complexity. There's already 23 possibilities wherever I create the regular expressions, but now there's 23 possibilities wherever I use it as well. Worse is that some of the regular expressions are used multiple places. When I modify a regular expression, I'd like to have it be done in one place (wherever the regexp is created) and not have to worry about it in some other place or places (wherever it's used). As it stands now, I can add new ways to express a date in one place, and it'll automatically -- the routine where I create all my regexps, and it'll automatically go into affect in the various places it might be used. Not a big problem of course... but I'm a huge fan of Larry's principle of laziness.	[reply]
Re^9: Speeding up named capture buffer access by BrowserUk (Patriarch) on Dec 01, 2009 at 20:42 UTC