Re: Style question: regex versus string builtin function
by shmem (Chancellor) on Oct 02, 2007 at 07:53 UTC
|
If $DELIMITER was dynamic and could contain a regex, I'd use a m//, otherwise index.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] [d/l] |
Re: Style question: regex versus string builtin function
by johngg (Canon) on Oct 02, 2007 at 08:57 UTC
|
If $DELIMITER was static and was being tested for more than once in the code I might consider making a compiled regex.
my $rxDELIMITER = qr{\Q$DELIMITER\E};
...
if ( $line =~ $rxDELIMITER ) { ...
I probably reach for regexen too quickly without even considering the use of index. I suspect I'm not the only one with a bit of a blind spot there.
Cheers, JohnGG | [reply] [d/l] [select] |
Re: Style question: regex versus string builtin function
by throop (Chaplain) on Oct 02, 2007 at 11:47 UTC
|
Use index. Even after using \Q, there are other odd cases lurking. From perlreref
If 'pattern' is an empty string, the last I matched
regex is used.
AlsoYou cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the corresponding variable, while escaping will cause the literal string \$ to be matched. You'll need to write something like m/\Quser\E\@\Qhost/.
The real 'style' question here, though, is Which form is most maintainable, most understandable when somebody looks at it two years from now?
And this use of the $DELIMITER is going to be rather opaque in either case. Therefore, the most important element of style here is a generous set of comments, explaining why $DELIMITER was broken out a separate variable (or constant.)
throop Update: lidden's point is well taken; even a zero-width assertion like \Q keeps the pattern from being empty. But see the discussion that follows | [reply] |
|
But 'pattern' is not an empty string after using \Q.
| [reply] [d/l] |
|
Silly enough this still counts as empty. In general I think the way empty regexes work is just bad design. It should only trigger if the regex is empty at the literal code level, not after all kinds of expansion has been done on the stuff between the delimiters.
| [reply] |
|
use Test::More 'tests' => 5;
ok( 'foo' =~ //, 'empty regex matches' );
ok( 'foo' =~ /foo/, '/foo/ matches' );
ok( !('bar' =~ //), 'repeated match of foo' );
ok( !('bar' =~ /\Q/), 'repeated match with \\Q' );
my $empty = '';
ok( !('bar' =~ /\Q$empty/), 'interpolated empty string same as \\Q' );
| [reply] [d/l] [select] |
|
|
|
print "\U\Qfoo.bar";
__END__
FOO\.BAR
lodin | [reply] [d/l] [select] |
Re: Style question: regex versus string builtin function
by thospel (Hermit) on Oct 02, 2007 at 11:33 UTC
|
I'd definitely go for the regex. If I later would read that code, I'd have to think for half a second about the index code to see that's it's not a "where is this needle", but that it's a "does the needle exist anywhere", while a regex immediately gives that kind of association. If index is faster, that is an implementation detail. If we care, we should just fix the perl optimization code to make them equivalent. But by default clarity not speed is the goal of writing code. | [reply] |
|
that it's a "does the needle exist anywhere", while a regex immediately gives that kind of association
g, that doesn't work for me.
Regexes are inherently more complex to use than the index function. There are the various regular expression dialects, there are the modifiers, and there
are the global variables upon which they may trample.
But, like others, I tend to reach for the match operator.
Be well,
rir
| [reply] [d/l] |
Re: Style question: regex versus string builtin function
by lima1 (Curate) on Oct 02, 2007 at 11:57 UTC
|
I use index when I need the match position, otherwise a regex. And it seems that index is NOT faster. Even code like
my $pos;
if ( $line =~ $regex ) {
$pos = length $`;
}
which gets the match position with a regex is slightly faster (but much uglier of course):
Update: For better ways of getting the match position, see How do I retrieve the position of the first occurrence of a match?.
Benchmark code:
Benchmark results:
Rate index regex_pos regex regex_compiled_pos rege
+x_compiled
index 450/s -- -38% -39% -40%
+ -41%
regex_pos 728/s 62% -- -2% -3%
+ -5%
regex 741/s 65% 2% -- -1%
+ -3%
regex_compiled_pos 749/s 66% 3% 1% --
+ -2%
regex_compiled 763/s 70% 5% 3% 2%
+ --
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
| [reply] [d/l] [select] |
|
|
|
Re: Style question: regex versus string builtin function
by apl (Monsignor) on Oct 02, 2007 at 09:45 UTC
|
I'd definitely use index. It's the simplest tool for this problem. | [reply] [d/l] |
Re: Style question: regex versus string builtin function
by graff (Chancellor) on Oct 02, 2007 at 13:04 UTC
|
... what if $DELIMITER contains a regex metachar?
What if the intention is that metacharacters in the variable should be used as such? What to use depends on what the intention is.
For cases where "TMTOWTDI" really applies, the choice of approach is not likely to matter all that much (except to those who are compelled to optimize). For cases where literal-vs.-metachar handling means a difference between success vs. error (or ability vs. inability to do a task), one tool will be better than the other, and whichever one is right, you still have to provide some safeguards and checks to try to handle all contingencies as best you can. | [reply] |
Re: Style question: regex versus string builtin function
by talexb (Chancellor) on Oct 02, 2007 at 17:33 UTC
|
While I don't doubt that index is faster, I like your first solution better, simply because it's more Perl-ish. You're seeing if a particular delimiter appears on a line.
The alternative would (for me) require I look up how index works -- it's a logical function to have in a language, I just don't think I've ever used it, so I'm not sure what the parameters are or what it returns.
That's just my preference.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] [d/l] [select] |