Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Bring back the smartmatch operator (but with sane semantics this time)!

by smls (Friar)
on Jun 10, 2014 at 16:52 UTC ( [id://1089424]=perlmeditation: print w/replies, xml ) Need Help??

No I'm not kidding, please hear me out... :)

History of smartmatch in Perl

Smartmatching was invented for Perl 6 where it turned out to be a very useful and well-loved1 feature, but the attempt to backport it to Perl 5.10 in 2009 did not turn out so great (and it was consequently deprecated again in Perl 5.18). Among the Perl 6 community the commonly accepted explanation for that is, from what I heard:

  • Perl lacks a strong & fine-grained type system (which in Perl 6, where it exists, adds much sanity to the concept of dynamic dispatch)
  • Perl lacks composable related features like Junctions (which allow the Perl 6 smartmatching rules to be simpler and less arbitrary than the rules that would be needed to facilitate the same set of use-cases in Perl)

These limitations are hard to circumvent, but I don't think that means Perl should have no smartmatching at all, it just means it should have less ambitious / more focused smartmatching.

I wasn't around at the time, but it looks to me as if the Perl 5.10+ smartmatching was designed with these goals:

  1. Support all use-cases that the Perl 6 smartmatch operator supports
  2. Use it as an opportunity to sneak in useful new comparison/searching operations into the core, without having to invent separate operator names for them, and without having to justify them individually

...and was thus doomed to failure.

Some later proposals for re-designing the smartmatch operator (like this 2011 post by brian d foy) tend to avoid mistake no. 1, but still fall into the second trap.

If there are comparison/searching operations that are deemed worthy of being added to Perl (say, "deep comparison" of two arrays, or checking whether an array contains a given scalar), then each of them should get its own operator. That's the normal Perl way: One operator per type of operation (that's why we have both == and eq for example).

How smartmatch should be designed

Smartmatching explicitly breaks with the conventional "one meaning per operator" rule by dynamically deciding what operation to perform based on its arguments. This means it should be carefully designed around use-cases where you actually need to dynamically decide what operation to perform. Operations that you would likely never want to mix-and-match, have no business being part of the smartmatch operator, even if they would be useful to have in core by themselves.

So, what are those use-cases where you actually need dynamic smartmatching? I can think of two major ones:

  1. When you want to avoid writing out  ($_ <operator> ...)  in a given/when construct, for the purpose of brevity/elegance:

    use v6; given $username { when 'root' { dostuff } when /^guest\d*$/ { die "You're not allowed to do stuff." } when any(<http apache>) { authenticate :web; dostuff } default { authenticate :local; dostuff } }

    Of course it is only elegant when the meaning is self-evident without consulting a manual, so this use-case only makes sense for commonly used & unambiguous comparison operations.

  2. When you want your code to test things against a "filter/pattern/rule" that is passed in from the outside, and you don't want to restrict it to just one way of filtering (e.g. only by string comparison, or only by regex, or only by callback etc.)

    For example, consider the Perl 6 built-in function dir, which lists the contents of a directory in the filesystem. It takes an optional 'test' argument, against which it promises to smartmatch each filename and only return the matching ones. Since smartmatch is built into the language, Perl 6 programmers need no further documentation to understand that parameter; they know they can use anything that would be valid as the right-hand-side argument of ~~ as the test, for example:

    use v6; dir '/some/directory', test => /\.txt$/; # a regex dir '/some/directory', test => none('.', '..'); # a junctionČ dir '/some/directory', test => &validate_filename; # a coderef

    The result is a very flexible but still elegant and predictable API that is easy to imitate in your own functions/modules that want to allow their users to "match" or filter stuff: Just use smartmatch as your filter implementation!

We can make new Perl 5 smartmatching rules useful for those use-cases, while still keeping them sane and predictable, by adhering to the these two principles:

  1. Decide what operation to perform, based on the type of the right-hand-side argument (and nothing else!)
    (Put another way, this means that  LHS ~~ RHS  can always be expressed in words as the question "Does LHS fit the constraint/template defined by RHS?")

  2. Blindly coerce the left-hand-side argument to the type that the chosen operation requires, just as normal Perl operators like eq also coerce their arguments.
    (So, for example, @foo ~~ /foo/ would be the same as @foo =~ /foo/, even though that may not be useful, rather than doing anything special just because it's an array!)

Sensible smartmatch rules

With that in mind, we can start to think about the kind of right-hand-side "things" that it should be possible to smartmatch against.

The following are no-brainers imo:

if RHS is an... (example) then  LHS ~~ RHS  should do...
undefined scalar $x ~~ undef !defined(LHS)
simple scalar $x ~~ 'foo' LHS eq RHS
regex (literal or reference) $x ~~ /foo/ LHS =~ RHS
code reference $x ~~ sub { ... } RHS->(LHS)
an object that overloads ~~ $x ~~ $object call the overload method, with LHS as argument

The 'simple scalar' case is not as elegant as one might wish it to be; Ideally it would be able to dynamically decide between string or numeric comparison like it does in Perl 6, but I don't think that is possible to do safely in Perl (its type system being what it is), so we need to take what we can get.

The following two rules also tend to be pretty useful in Perl 6, and it might make sense to add them to our hypothetical new Perl smartmatch, but I'm unsure about them because range literals and typename barewords are not usually treated as first-class "things" in Perl, so it might feel strange:

if RHS is a... (example) then  LHS ~~ RHS  should do...
bareword $node ~~ XML::LibXML::Node ref(LHS) eq "RHS"
range literal $age ~~ 0..17 interpret LHS as a number, and check if is within the range

Lastly, the lack of junctions in Perl could be partially remedied by interpreting an array/list on the right-hand-side like an any() junction:

if RHS is an... (example) then  LHS ~~ RHS  should do...
array or list $switch ~~ qw(yes true on 1) (grep { LHS ~~ $_ } RHS) >= 1

Of course, a better solution would be to add junctions to Perl together with re-adding smartmatch... :)
(Perl6::Junctions already exists on CPAN, but it relies on at least one awful hack due to the fact that it is non-core).

Anyway, the above rules would be more or less a subset of both Perl 6 smartmatching and the deprecated Perl 5.10+ smartmatching, but without the craziness of the latter.

And that's it; All cases not handled by these rules should generate a runtime error.
I don't think any other special cases need to be added - in particular, all the arbitrary behaviors that Perl 5.10+ smartmatching added for when one or both arguments were arrays/hashes, only served to confuse people and made the operator "not safe to use" in practice. Let's not repeat that mistake.

PS: In case you want to get a "feel" for what this kind of smart-matching is like in practice, check out Toby Inkster's match::simple module which implements very similar rules to what is discussed here (but suffers from some unavoidable limitations due to the fact that it is not in core).

---

1) Among the small but passionate fan base of Perl 6 :)
2) This particular junction is in fact used as the default when the 'test' argument is omitted.

Replies are listed 'Best First'.
Re: Bring back the smartmatch operator (but with sane semantics this time)!
by moritz (Cardinal) on Jun 10, 2014 at 19:58 UTC

    At first glance, I like your proposal.

    One thing that will surprise people is that it contains no provision for numeric comparison, so by the numeric/string duality, numbers will be compared as strings.

    I have my doubts about the cases you're not sure about; distinguishing barewords from strings is impossible in non-strict mode (which I rarely use, but which is still a big part of perl, whether you like it or not).

    Ranges become flipflops in scalar context and a list in list context, so doing a range comparison seems pretty much inconsistent with the rest of the language.

    Comparing lists with any-semantics requires the right-hand side of the ~~ to be evaluated in list context, which IMHO would be a bit surprising.

      One thing that will surprise people is that it contains no provision for numeric comparison, so by the numeric/string duality, numbers will be compared as strings.

      Yes, I agree that this is the biggest weakness of the suggested rules.

      The problem of surprise could be partially mitigated by emitting a warning when a numeric literal is used as the RHS of a smartmatch (or as a when-expression), but of course numbers coming in through variables would still silently use string comparison.

      As for there being no way to employ numeric comparison, in theory people could create a Num class that overloads ~~ and forces numeric comparison, but such a solution likely won't be worth it to users of smartmatch or given/when (since the whole point of those features is to make things simpler).
      So I guess the case of numeric comparisons simply won't be covered, or rather, people will be forced to write it out verbosely:

      given ($_ == 5) { ... } $foo ~~ sub { shift == 5 }

      Not ideal, but also not necessarily a deal-breaker.

      Of course, I would love to be proved wrong regarding separate string/number dispatch not being possible to do safely in Perl... :)

      I have my doubts about the cases you're not sure about; distinguishing barewords from strings is impossible in non-strict mode (which I rarely use, but which is still a big part of perl, whether you like it or not).

      Ranges become flipflops in scalar context and a list in list context, so doing a range comparison seems pretty much inconsistent with the rest of the language.

      Comparing lists with any-semantics requires the right-hand side of the ~~ to be evaluated in list context, which IMHO would be a bit surprising.

      All good points. I guess it's better to leave those rules out then.

      As for junction semantics, it occurred to me that the problems/warts of the existing Junction module(s) on CPAN would probably go away once ~~ like described above (but without the three "bonus" rules) is in core and those classes have been updated to overload it (rather than relying on hacks like overloading == for non-standard things.

      So, support for  $topic ~~ any('foo', 'bar', qr/baz/)  would then be provided by those modules and would not need to be hacked into the smartmatch operator itself.

      And if, at a later time, one of those junction modules makes it into core too, then all the better.

      One thing that will surprise people is that it contains no provision for numeric comparison, so by the numeric/string duality, numbers will be compared as strings.

      Internally, Perl knows when to implicitly convert a number to a string and vice-verse, so a new smart match should be able to know when it's dealing with numbers.

      Following the proposed "RHS decides the op", then when the RHS is an actual number, it could coerce the LHS to a number.1

      Overall, I like this proposal. It is far better than loosing smart match (and, especially, given/when).

      ---

      1In theory it could check both sides for numbers, but I would rather let the RHS blindly impose numericy on the LHS for consistency.

        Following the proposed "RHS decides the op", then when the RHS is an actual number, it could coerce the LHS to a number.

        It's the when the RHS is an actual number that's problematic. When do you consider something a number? The fact that I have happened to know that both Mojo::JSON and JSON::Tiny recently adjusted their heuristic for what to consider a number (and at least Mojo::JSON has been around for quite some time before) doesn't make me very optimistic that there is a reasonable way to answer that question in perl 5, precisely because of the automagic coercion / SV-NV duality.

      If we had to live with only having string compare on scalar values, it could be worked around by nesting given with in the default clause of another given:

      given (+ $x) { when (+ $n) { ... } when (+ $m) { ... } default { given ($x) { ... } }

      But, having dug in to Perl Guts more than I should have, I think that disallowing tied (and probably other magic) variables on the RHS, a built-in smart match would be able to reliably determine if the RHS is a number.

        But, having dug in to Perl Guts more than I should have, I think that disallowing tied (and probably other magic) variables on the RHS, a built-in smart match would be able to reliably determine if the RHS is a number.

        The problem is that Perl doesn't expose the concept "this scalar is a number" to the user (by design). Thus making a decision based on whether a scalar is a number is nearly always wrong.

        A piece of code that already makes such a decision is the code that decides whether to warn on a numeric operation:

        $ perl -wE 'say 0 + "1.2"' 1.2 $ perl -wE 'say 0 + "1.2.3"' Argument "1.2.3" isn't numeric in addition (+) at -e line 1. 1.2

        So, what do I keep complaining about? A real problem are dual vars. Those aren't just a rare corner case that should be avoided, but for example the result from boolean operators:

        $ perl -wE 'say 0 + !1' # no warning 0 $ perl -wE 'my $false = !1; say "<<$false>>"' # empty string! <<>>

        So it's a number, but it's also an empty string. Should smart-matching against that be numeric or string comparison? My intuitive reaction is "string comparison", because $false doesn't round-trip when converted to a number and then to a string.

        But you can also construct valid cases where round-tripping to a number is the wrong criterion; an example is if a user-supplied string is never used as a number, but happens to look like a number. You certainly don't want those values to try to coerce your own strings to numbers (and warn).

        So, however you decide whether a scalar is number or a string for the purpose of comparision, I can point out a case where your decision is a big WTF. Which is precisely the reason that we have separate == and eq operators.

        Whatever will be done about that, the string/number duality will remain a weakness of any Perl 5 smartmatch proposal.

Re: Bring back the smartmatch operator (but with sane semantics this time)!
by Anonymous Monk on Jun 10, 2014 at 20:52 UTC

    I haven't missed smartmatch much, just about the only thing I ever used it for was its distributive property, i.e. $foo ~~ ['a','b','c'], which can be replaced by Quantum::Superpositions or Perl6::Junction. I like the idea of being more specific with your conditions, e.g. $a eq $b, $a == $b, $a->($b), etc. - I find it more easily readable, which is better for later maintenance too. On the other hand, I also think implicit matching against $_ is a good idea (shorter code can be considered more readable too), as long as it's still obvious what the condition is.

    I have missed the built-in "switch" (given/when) quite a bit, especially when as a statement modifier as well as given's ability to return the last value evaluated - i.e. my $x = do { given ($in) { "abc" when !defined; "def" when /d/; ... } };

    I think the problem is that a truly perlish "switch" needs some kind of "smart matching" to be successful. And the problem with ~~ is not just in the implementation, but also that it allows for so much flexibility that people have different ideas on how it's "supposed" to interpret its arguments. Your post is one example, and for completeness, here's my view: a basic "switch" should be able to automatically apply defined, eq, ==, and =~ to the cases, and some way to specify multiple conditions per case (in the case of smartmatch, that would be its distributive property).

    So, a while back I did some simple evaluations of some of the "switch" alternatives out there. Here are my notes from that evaluation (no particular order):

    • Switch
      • uses source filters, buggy, has been removed from core
    • Switch::Plain
      • fairly simple and straightforward
      • separate sswitch / nswitch statements
      • requires Perl >= v5.14
    • Switcheroo
      • more powerful than Switch::Plain
      • quite a few dependencies, including Parse::Keyword which is marked as "DEPRECATED"
      • requires Perl >= v5.14 (requires Perl keyword API)
    • Switch::Perlish
      • uses native perl syntax
      • includes its own "smart matching"
      • has been failing tests since since Perl 5.13.6

    I personally like Switch::Plain and Switcheroo, I'm a little turned off by the simplicity of the former and the dependencies of the latter (matter of taste, I know), and both require at least Perl v5.14, which isn't nice for backwards-compatible code.

    So, I find myself still writing if-elsif chains and spelling all my conditions out, while sometimes wishing for the core given/when back, at least with the simple cases I use it for...

Re: Bring back the smartmatch operator (but with sane semantics this time)!
by davido (Cardinal) on Jun 11, 2014 at 18:16 UTC

    I mentioned this in another deeper-level followup in this thread, but I want to try to explain my position.

    Perl's type system is its operators. This is why we don't have PHP's "===" and "==" operators; they try to guess at programmer's intent. Perl doesn't guess; whenever possible it reacts to known intent, which is different from guessing at intent. Our scalar containers are typeless; it's up to the programmer to decide what type to imply at any point in the program. Resorting to heuristics to decide what type a container may be is pointless; containers in Perl are typeless. What has to happen, is operators have to know how to provide a type context.

    Smartmatch works fine in a language whether containers have type, and operators react to the type. It doesn't work well in a language where containers don't have type, and operators specify the type. It ends up either having a huge list of rules and heuristics that people have to consult the manual to recall, or a small set of rules and heuristics that reduces its value while still leading to less clear code than the type-specifying operators.

    In short, Perl operators are chosen by programmers to impose type. Perl reacts by providing the correct type to the operators. If you try to invert that paradigm in a language where containers have no type, you end up with messy heuristics or complex rules, or incomplete operator implementations (as in a reduced-functionality smartmatch)


    Dave

      Perl has fairly clear types like "scalar", "array", and "hash", so it could be useful for smart match to do different things when given an array or an array ref compared to when given a non-reference scalar. But even that case is not 100% clean (mostly due to overloading where you can have an object implemented as a reference to a hash that wants to behave like a scalar string or like a reference to a virtual array or such).

      But you are very right when it comes to trying to distinguish between "number" and "string".

      Smart match only looking at the "type" of one argument can improve the problem significantly. So I could see "when( 14 )" noticing that a literal number was given in the source code and so a numeric comparison should be used. But that clean line quickly becomes muddier in the face of code refactoring to things like "my $const = 14; ... when( $const )".

      You either let that level of refactoring break smart match or you don't let smart match try to distinguish between strings and numbers (or else you doom smart match).

      The only 100% clean case is having "when( @array )" behave differently than "when( $scalar )" (and similar).

      - tye        

Re: Bring back the smartmatch operator (but with sane semantics this time)!
by ikegami (Patriarch) on Jun 12, 2014 at 16:30 UTC
Re: Bring back the smartmatch operator (but with sane semantics this time)!
by Anonymous Monk on Jun 10, 2014 at 22:11 UTC
    being experimental is "deprecated" now?

      In this case, yes.

      The old behavior is deprecated (a.k.a lo longer recommended for use due to expected removal in the future), but the operator is marked as "experimental" because they hope to come up with a new behavior for it rather than removing the operator completely.

      From perl5180delta: "It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended."

Re: Bring back the smartmatch operator (but with sane semantics this time)!
by Anonymous Monk on Jun 11, 2014 at 19:43 UTC

    A lot of the discussion so far has hinged on how to tell strings from numbers (eq vs. ==), so I'm surprised no-one has mentioned Scalar::Util's looks_like_number() yet... the only downsides I see so far are that it might not be as fast as some other heuristic, and that sometimes I might want to actually compare two numbers as strings explicitly. (Okay one more... looks_like_number recognizes a few strings as numbers, such as "Inf", "Infinity", "NaN" and "0 but true". And it doesn't recognize "1_000" as a number... hmmm. See Perl_grok_number() in numeric.c)

      The string/number heuristic is responsible for some of the oddest behaviour of smart match. Let's take for example the use of $x ~~ @array as an "in" operator.

      sub as_bool { $_[0] ? "true" : "false" } my @greetings = qw( Hi Bye ); say as_bool("Hi" ~~ @greetings); # true say as_bool("Hiya" ~~ @greetings); # false

      OK, everything is working as expected so far. Now let's do something a little more complex...

      push @greetings, 0, 1; say as_bool(0 ~~ @greetings); # true say as_bool(1 ~~ @greetings); # true say as_bool(2 ~~ @greetings); # false say as_bool("Hiya" ~~ @greetings); # true ?!?!
      use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name

        Not having memorized the smartmatch table, that looked pretty strange to me at first, but looking at the table, at least it's well-defined... and when I tried it I got a "Argument "Hiya" isn't numeric in smart match" warning, which (although it's not much) is better than it silently matching. But the problem of when to match with == vs. eq remains...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1089424]
Approved by boftx
Front-paged by snoopy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-03-19 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found