Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Why split function treats single quotes literals as regex, instead of a special case?

by perlfan (Vicar)
on Aug 14, 2020 at 16:51 UTC ( [id://11120732]=note: print w/replies, xml ) Need Help??


in reply to Why split function treats single quotes literals as regex, instead of a special case?

>Am I missing something?

Yes, this is Perl not Python.

>Why?

I can assert that conextually, splitting on all characters for split //, $string is a lot more meaningful than splitting on nothing and returning just the original $string. The big surprise actually happens for users (like me) who don't realize the first parameter of split is a regular expression. But that surprise quickly turns into joy.

>In general, split function should behave differently if the first argument is string and not a regex.

Should? That's pretty presumptuous. You'll notice that Perl has FAR few built in functions (particularly string functions) than PHP, JavaScript, or Python. This is because they've all been generalized away into regular expressions. You must also understand that the primary design philosphy is more related to spoken linquistics than written code. The implication here is that humans are lazy and don't want to learn more words than they need to communicate - not true of all humans, of course. But true enough for 99% of them. This is also reflected in the Huffmanization of most Perl syntax. This refers to Huffman compression, which necessarily compresses more frequently used things (characters, words, etc) into the symbols of the smallest size. I mean Perl isn't APL, but certainly gets this idea from it.

The balkanization of built-in functions that are truly special cases of a general case is against any philosophical underpinnings that Perl follows. I am not saying it's perfect, but it is highly resistent to becoming a tower of babble. If that's your interest (not accusing you of being malicious), there are more fruitful avenues to attack Perl. Most notably, the areas of object orientation and threading. But you'll have pretty much zero success convincing anyone who has been around Perl for a while that the approach to split is incorrect.

Oh, also a string (as you're calling it) is a regular expression in the purest sense of the term. It's best described as a concatenation of a finite set of symbols in fixed ordering. For some reason a lot of people think this regex magic is only present in patterns that may have no beginning or no end, or neither. In your case it just happens to have both. Doesn't make it any less of a regular expression, though.

  • Comment on Re: Why split function treats single quotes literals as regex, instead of a special case?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Why split function treats single quotes literals as regex, instead of a special case?
by likbez (Sexton) on Aug 14, 2020 at 19:29 UTC
    The balkanization of built-in functions that are truly special cases of a general case is against any philosophical underpinnings that Perl follows. I am not saying it's perfect, but it is highly resistant to becoming a tower of babble. If that's your interest (not accusing you of being malicious), there are more fruitful avenues to attack Perl

    I respectfully disagree. Perl philosophy states that there should be shortcuts for special cases if they are used often. That's the idea behind suffix conditionals (  return if (index($line,'EOL')>-1) ) and bash-style if statement (($debug) && say line; )

    You also are missing the idea. My suggestion is that we can enhance the power of Perl by treating single quoted string differently from regex in split. And do this without adding to balkanization.

    Balkanization of built-ins is generally what Python got having two different functions. Perl can avoid this providing the same functionality with a single function. That's the idea.

    And my point is that this particular change requires minimal work in interpreter as it already treats ' ' in a special way (AWK way).

    So this is a suggestion for improving the language, not for balkanization, IMHO. And intuitively it is logical as people understand (and expect) the difference in behavior between single quoted literals and regex in split. So, in a way, the current situation can be viewed as a bug, which became a feature.

      >So, in a way, the current situation can be viewed as a bug, which became a feature.

      To be fair, this is a lot of perl. But I can't rightfully assert that this behavior was unintentional, in fact it appears to be very intentional (e.g., awk emulation).

      >You also are missing the idea.

      My understanding is that you wish for "strings" (versus "regexes") to invoke the awk behavior of trimming leading white space. Is that right? I'm not here to judge your suggestion, but I can easily think of several reasons why adding another special case to split is not a great idea.

      All I can say is you're the same guy who was looking for the trim method in Perl. If that's not a red flag for being okay with balkanization, I don't know what is.

      Finally, I must reiterate. A "string" is a regular expression. The single quoted whitespace is most definitely a special exception since it is also a regular expression. You're recommending not only removing one regex from the pool of potential regexes, but an entire class of them available via quoting - i.e., fixed length strings of a fixed ordering. I am not sure how this is really a suggestion of making all quoted things not be regexes, because then how do you decide if it is "regex" or not? (maybe use a regex? xD)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-19 16:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found