Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Why split function treats single quotes literals as regex, instead of a special case?

by jwkrahn (Abbot)
on Aug 14, 2020 at 03:33 UTC ( [id://11120706]=note: print w/replies, xml ) Need Help??


in reply to Why split function treats single quotes literals as regex, instead of a special case?

The single space character is a special case for split, anything else is treated as a regular expression, be it a string, function call, etc.

Regular expressions are also treated a bit differently than regular expressions in qr//, m// and s///.

  • Comment on Re: Why split function treats single quotes literals as regex, instead of a special case?

Replies are listed 'Best First'.
Re^2: Why split function treats single quotes literals as regex, instead of a special case?
by AnomalousMonk (Archbishop) on Aug 14, 2020 at 04:38 UTC

    The single space character is a special case for split ...
    I.e., per split:
    As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20", but not e.g. / /). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.
    You also write:
    Regular expressions are also treated a bit differently than regular expressions in qr//, m// and s///.
    I don't understand this statement. Can you elaborate?


    Give a man a fish:  <%-{-{-{-<

      The regular expression // works differently in split then elsewhere:

      $ perl -le' my $x = "1234 abcd 5678"; print $& if $x =~ /[a-z]+/; print $& if $x =~ //; print map qq[ "$_"], split /[a-z]+/, $x; print map qq[ "$_"], split //, $x; ' abcd abcd "1234 " " 5678" "1" "2" "3" "4" " " "a" "b" "c" "d" " " "5" "6" "7" "8"

      Also, the line anchors /^/ and /$/ don't require the /m option to match lines in a string.

        The regular expression // works differently in split then elsewhere...

        I think I'd consider this just another special-case fixup prior to running split rather than a true difference in the function of m//:

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = qq{1234 abcd 5678}; dd split //, $x; dd split /\b|\B/, $x; " (1 .. 4, " ", "a" .. "d", " ", 5 .. 8) (1 .. 4, " ", "a" .. "d", " ", 5 .. 8)
        This is probably just a matter of emphasis and interpretation.

        ... line anchors /^/ and /$/ don't require the /m option to match lines in a string.

        Checking the docs, I recalled seeing this discussed before, but it's another one of those very specialized special cases that evaporates from my memory with time. However, it's not true for the /$/ case (per the docs (or at any rate, the docs say nothing about special-casing it)):

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $x = qq{1234 \n abcd \n 5678}; dd split /^/, $x; dd split /$/, $x; " ("1234 \n", " abcd \n", " 5678") "1234 \n abcd \n 5678"


        Give a man a fish:  <%-{-{-{-<

        The regular expression // works differently in split then elsewhere

        I think it is actually the other way around — in most contexts, m// is special (it refers to the most recent pattern without duplicating that pattern), while in split, // is literally the empty regex, which matches the zero-length empty string.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120706]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 10:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found