http://qs321.pair.com?node_id=592186


in reply to Understanding Split and Join

chromatic has pointed out that split treats an empty pattern normally, not as a directive to reuse the last successfully matching pattern, as m// and s/// do.

A pattern that split treats specially but m// and s/// treat normally is /^/. Normally, ^ only matches at the beginning of a string. Given the /m flag, it also matches after newlines in the interior of the string. It's common to want to break a string up into lines without removing the newlines as splitting on /\n/ would do. One way to do this is @lines = /^(.*\n?)/mg. Another, perhaps more straightforward, is @lines = split /^/m. Without the /m, the ^ should match only at the beginning of the string, so the split should return only one element, containing the entire original string. Since this is useless, and splitting on /^/m instead is common, /^/ silently becomes /^/m.

This only applies to a pattern consisting of just ^; even the apparently equivalent /^(?#)/ or /^ /x are treated normally and don't split the string at all.

Replies are listed 'Best First'.
Re^2: Understanding Split and Join
by ferreira (Chaplain) on Dec 30, 2006 at 19:34 UTC
    Both exceptions, the special treatment of // and /^/ by split, are documented in split. Both may deserve to be mentioned in the tutorial quickly for the profit of the unaware. The last remark by ysth about the non-equivalence of /^(?#)/ and /^ /x with // for split purposes is a subtle thing. More subtle if you compare to the fact that / /x, / # /x or even / (?#)/x have the same treatment as // when passed to this function. Looks like a case to be fixed either in the docs or in the code of the Perl interpreter itself (if not barred by compatibility issues).
      Looks like a case to be fixed either in the docs or in the code of the Perl interpreter itself
      I'm not sure what you mean by "fixed"? split doesn't have the special logic for // that match and substitution do, but even those operations don't have special logic for / /x, / # /x or / (?#)/x.
        I think I have misunderstood that behavior:
        $ perl -e '@a = split //, 'abc'; print "@a"' a b c $ perl -e '@a = split / /x, 'abc'; print "@a"' a b c $ perl -e '@a = split / # /x, 'abc'; print "@a"' a b c $ perl -e '@a = split / (?#) /x, 'abc'; print "@a"' a b c

        The logic of split does not need to be special for this to work. These are real empty patterns and split understand they are meant to split characters as they return empty delimiters. But you were saying that the same is not automatic for patterns equivalente to /^/. Sorry for the confusion I made.