http://qs321.pair.com?node_id=11119478

I have tried to understand the documentation of the Perl function split. Doing this I have realized that after more that 30 years usage of Perl I have still not understood some basic properties of Perl. Here follows some of "my new findings" and questions.

Perl subroutine

From Perl documentation:

The Perl model for function call and return values is simple: all functions are passed as parameters one single FLAT LIST of scalars, and all functions likewise return to their caller one single FLAT LIST of scalars.
A LIST value is an unnamed list of temporary scalar values that may be passed around within a program from any list-generating function to any function or construct that provides a list context.

LIST, list value constructor and list value

A list value can be created by using a list value constructor. It is a number of arguments separated by Comma Operators.

Comma Operator:

"In list context, it's just the list argument separator, and inserts both its arguments into the list. These arguments are evaluated from left to right."

Conclusions:

FLAT LIST

A Perl list (= list value) is a sequence of scalar values. A list is not a scalar. Not even the empty list is a scalar.

This means that all list values are flat!?

Probably is the flattening a part of the comma operator and done for all lists (= list value)

"The null list is represented by (). Interpolating it in a list has no effect. Thus ((),(),()) is equivalent to (). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that point."

Conclusions:

Positional arguments and parameters

It took me a long time to understand that my problem was, a subroutine returning an empty list, in the parameter list in the call to a subroutine.

Here I use: In a subroutine call the arguments are passed/bound to the parameters (formal argument) in the subroutine definition. The arguments for a call are evaluated, and the resulting values are passed to the corresponding parameters.

I have always though, that the commas in a call to a subroutine, separate the argument list in sequence of values each corresponding to a parameter. call('P1', 'P2', 'P3').

There can be Positional parameters in a subroutine definition. But not all of the Positional arguments becomes a Positional parameter!?

"Any arguments passed in show up in the array @_." This can be false!? A Positional argument can be flattened away.

use strict; use warnings; use 5.010; use Data::Dump qw(dump dd ddx); sub p2 { 'P2' } sub nop1 { } sub nop2 { return } sub test { say dump @_ } test( 'P1', 'P2', 'P3' ); test( 'P1', p2, 'P3' ); test( 'P1', nop1, 'P3' ); test( 'P1', nop2, 'P3' ); __DATA__ Output: ("P1", "P2", "P3") ("P1", "P2", "P3") ("P1", "P3") ("P1", "P3")

Conclusions:

Split does not behave like a subroutine

In perlfun it is stated: "Here are Perl's functions (including things that look like functions, like some keywords and named operators) arranged by category"

What is split? A keyword, named operator or ... .

It is also stated:

Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use parentheses, the simple but occasionally surprising rule is this: It looks like a function, therefore it is a function.

However split does not behave like a Perl subroutine. So the statement "therefore it is a function" does not mean that split behaves like a Perl subroutine.

The evaluation of the first argument to split is delayed. This is different from normal subroutines. Is this indicated by the slashes in /PATTERN/?

This script shows that an array can not be used in the arguments to split. It is also different from normal subroutines.

use strict; use warnings; use 5.010; my $pat = ':'; my $str = 'O:K'; my $pat_ref = \$pat; my $str_ref = \$str; my @par = ( $pat, $str ); my $rv = split $pat, $str; # OK $rv = split $pat, $$str_ref; # OK $rv = split $$pat_ref, $$str_ref; # OK $rv = split( $par[0], $par[1] ); # OK # W1: Use of uninitialized value $_ in split $rv = split @par; # NOK W1 $rv = split(@par); # NOK W1 $rv = split( @par[ 0, 1 ] ); # NOK W1 $rv = split( ( @par[ 0, 1 ] ) ); # NOK W1 $rv = split( map { $_ } @par ); # NOK W1 $rv = split( ( map { $_ } @par ) ); # NOK W1

Conclusions:

Is this a bug?

#!/usr/bin/perl -w use strict; use warnings; use 5.010; use Data::Dump qw(dump dd ddx); ddx (1,'a', ()); ddx scalar (1,'a'); ddx scalar (1,'a', ()); my $var = (1,'a', ()); ddx $var; __DATA__ output: Useless use of a constant ("a") in void context at pm_empty_list.pl li +ne 11. Useless use of a constant ("a") in void context at pm_empty_list.pl li +ne 12. # pm_empty_list.pl:9: (1, "a") # pm_empty_list.pl:10: "a" # pm_empty_list.pl:11: undef # pm_empty_list.pl:13: undef

I had expected this

# pm_empty_list.pl:11: undef # pm_empty_list.pl:13: undef
to be
# pm_empty_list.pl:11: "a" # pm_empty_list.pl:13: "a"

Replies are listed 'Best First'.
Re: Split does not behave like a subroutine
by LanX (Saint) on Jul 18, 2020 at 07:00 UTC
    > Is this a bug?

    Perl doesn't have a list data type but list operators.

    The scalar comma operator returns the last element.

    An empty list in scalar context is undef.°

    HTH

    Update

    > The differences between split and a normal subroutine should be clarified.

    a default subroutine has no prototype, split has.

    (Still, there are anormal functions without simple prototype )

    Update

    Sorry split is one of those built-ins without reproducible prototype, hence it's more complex:

    DB<4> x prototype "CORE::push" 0 '\\@@' DB<5> x prototype "CORE::split" 0 undef

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) you seem to belive that naked parentheses impose list context and create a list value, but they are only grouping precedence !

    ... see List's terminology and Parentheses in Perl for details.

      Thanks LanX for the answer! I have got a lot to think over.

      To understand this: > The scalar comma operator returns the last element. I must understand in which list it is the last element.

      I have to understand what (LIST) means in the documentation.

      I believe you for Perl can say:

      • The return value context ( scalar or list ) controls how the return value is created.
      • The side effects (everything but the return value) of calling something does not depend on the return value context.

      From List value constructors

      (In the text it is very difficult to understand which parts apply to use in list context, scalar context or both.)

      LVC1: List values are denoted by separating individual values by commas (and enclosing the list in parentheses where precedence requires it): (LIST)

      LVC2: In a context not requiring a list value, the value of what appears to be a list literal is simply the value of the final element

      LVC3: LISTs do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is evaluated in list context, and the resulting list value is interpolated into LIST just as if each individual element were a member of LIST.

      LVC4: The null list is represented by (). Interpolating it in a list has no effect.

      From Comma Operator

      CO1: Binary "," is the comma operator.

      CO2: In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value.

      CO3: In list context, it's just the list argument separator, and inserts both its arguments into the list. These arguments are also evaluated from left to right.

      My own definitions

      Several (possible) lists can bee involved:

      • LVC_arg: The list of arguments in the List value constructor [LVC0]
      • LVC_par: The resulting parameter list value
      • LVC_list_value: the list value which is the output from the constructor in list context

      LVC_scalar_value: the scalar value which is the output from the constructor in scalar context

      My first try to: The execution of syntactic construct (LIST)

      The elements in LVC_arg are evaluated from left to right according to [CO2] or [CO3]. [CO2] and [CO3] gives the same execution order and the same side effects. The only difference is how the return value is created.

      The execution of a element in LVC_arg is recursive according to [LVC3]. During this is the LVC_list_value created. Note that this execution of the elements are done in list context.

      When the (LIST) is called in scalar context the LVC_list_value is not used. Instead is the LVC_scalar_value created and returned. The LVC_list_value is the value of the last element, in the last recursion. The execution of the last element is done in scalar context. An execution of the empty list, (), in list context is a "no operation" [LVC4]. The execution of an empty list in scalar contexts results in undef [?].

        Sorry that's too complicated for me, TL;DR

        > I have to understand what (LIST) means in the documentation.

        According to the docs cited in

        List's terminology and Parentheses in Perl

        <updated>

        Uppercase "LIST" in perldocs is a argument placeholder for any code ...

        • compiled in "list context"
        • delivering a "list value" at run time

        For instance from the docs for print ...

        print LIST; means that ...

        • print 1,2,3;

          the two commas are compiled to create a "list value" 1,2,3 for print

        • print @a;

          the array @a will be compiled to pass it's content as "list value" ( i.e.not his length from scalar context)

        • print func();

          the sub &func will be called in "list context" at run time and return a "list value" from it's last statement. ( compare wantarray )

        </updated>

        > To understand this:

        > > The scalar comma operator returns the last element.

        > I must understand in which list it is the last element.

        Comma separated lists (sic)

        Please compare (updated)

        # --- list assignments @a = (1,2,3); # list comma @a = (1..3); # list range ($a) = @a; # deconstruction => $a==1 # --- scalar assignments $a = (1,2,3); # scalar comma $a = (1..3); # flip flop operator ... Oo oops! $a = @a; # scalar @a == length

        The parens on RHS only do grouping here they do not create a list. Hence the context is propagated to the operators , and ..

        @a = and (...)= are "list assignment" imposing "list context".

        The range 1..3 returns the list value 1,2,3 in list context only.

        You have to think of the whole statement as an op-tree...

        • propagating context down
        • passing "values" up

        With ...

        • operators changed by upper context
        • operators (sometimes) creating down context

        If that's not clear enough please show a short code exemplifying your problem.

        HTH

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        update

        you may want to play around yourself with B::Deparse and B::Concise to experiment around with the snippets shown

        perl -MO=Deparse,-p -e "CODE"

        and

        perl -MO=Concise -e "CODE"

        The docs were written by different persons, and are hence heterogeneous and not consistent.

        (kind of "there is more than one way to word it" ... well )

        Even perlglossary which originates from the Camel-Book seems to have been patched afterwards.

        For a trained mathematician used to an axiomatic approach it's a painful experience ... >(

        So I'd strongly recommend to

        • use perlglossary for definitions
        • use tests to proof a thesis (runtime)
        • use B::Deparse and B::Concise for visualization (compilation)

        In an ideal world any other perldoc should be corrected.

        I never acquired the Camel book I know it's vast but should be canonic.

        It's a long time since I had a look into chromatics "Modern Perl", but it left very consistent impression. So maybe a good source, not sure if he clarifies "lists" there.

        update

        Modern::Perl#Lists

        confusing too.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Split does not behave like a subroutine
by haukex (Archbishop) on Jul 18, 2020 at 08:26 UTC
    Sometimes there are contradictions. What applies to a list (list value), perhaps does not apply to a list (list value constructor)

    Can you show some examples?

    In the documentation often list is used for both list value and list value constructor. This make it difficult to understand the documentation. ... The list concept and for all flattening should be described more clearly in the Perl documentation. ... The risk to lose a Positional argument should be explicit clarified

    I don't disagree that Perl's concept of lists can take some getting used to, and clarifying documentation is always useful. I should note though that, while Perl's documentation is often used as a reference, it's not always perfect, and I would suggest also looking into The Camel and similar books (e.g. Learning Perl, Modern Perl) to see if they help explain it better. In any case, documentation patches are usually a good thing; in my own experience, one might have to revise them a couple of times after feedback from P5P, but that should only help improve them.

    Here I use: In a subroutine call the arguments are passed/bound to the parameters (formal argument) in the subroutine definition. The arguments for a call are evaluated, and the resulting values are passed to the corresponding parameters. I have always though, that the commas in a call to a subroutine, separate the argument list in sequence of values each corresponding to a parameter. call('P1', 'P2', 'P3'). There can be Positional parameters in a subroutine definition.

    To me this seems to be the core of your question, and I have to say that what you write here is actually not Perl's concept. You quoted it yourself:

    The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities ...

    You're saying that in test( 'P1', nop1, 'P3' );, nop1 is a positional parameter, but the way Perl sees it is after evaluation. You may also note that perlsub makes no mention of "positional" (except in the section on the still-experimental signatures).

    In conservative programming, there should be no subroutine calls in the argument list!?

    It's something to be careful with, definitely; there have even been serious security issues related to this. There are workarounds though: Perl's scalar can be used to force a single scalar value, and there are extensions like PerlX::Maybe for pairs of arguments (which are like named parameters, though), and of course there's parameter validation using e.g. Type::Params. (I hesitate to mention it, because they should be used very sparingly and only when one knows what one is doing, but Prototypes can also be used to force scalar context on arguments.)

    Split look like a subroutine but does not behave like one.

    Correct, there are a few Perl functions that are exceptions and parsed differently from the rest. As LanX mentioned, they can generally be identified by prototype returning undef. The EXPR form of map and grep and similar (the BLOCK forms can be created using the & prototype). Note that split does behave like there is an implicit qr// on the pattern.

    The meaning of the slashes in /PATTERN/ should be explained.

    I'm not sure about this one, since it's just a regular expression like any other (perlop, perlretut, perlre). (Okay, one exception that I can think of off the top of my head: the empty pattern // is treated differently.)

      > Correct, there are a few Perl functions that are exceptions and parsed differently from the rest. As LanX mentioned, they can generally be identified by prototype returning undef.

      I think it's unfortunate that undef means ...

      • special parsing for CORE::builtins.
      • default LIST for other subs

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      UPDATE

      Am I wrong or is there no difference between

      • no prototype
      • prototype (@)

      hence both equally and undistinguishable allowing to call func(LIST) ?

        I think it's unfortunate that undef means special parsing for CORE::builtins. / default LIST for other subs

        Yes, I agree, it's unfortunate. However, from a quick check it seems that all Perl builtins that accept a plain list explicitly have a @ prototype (like die, unlink, chown), and only those with special parsing return undef for their prototype.

        Am I wrong or is there no difference between no prototype / prototype (@)

        I think that's true, yes.

      Thanks haukex for the answer!r

      Sometimes there are contradictions. ... Can you show some examples?

      My original problem was to understand the way from the subroutine call in the script source code (the list of expressions) to the list of parameters (formal argument) in the definition of the subroutine.

      LISTs do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is evaluated in list context, and the resulting list value is interpolated into LIST just as if each individual element were a member of LIST (from List value constructors)
      Like the flattened incoming parameter list, the return list is also flattened on return. (from perlsub)
      The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities --but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (from perlsub)

      Based on those quotes, learning a little about the Perl interpreter and using B::Deparse and B::Concise, I have learnt a more perlish way of thinking.

      • LIST is a list of expressions in the source code.
      • When LIST is evaluated the result is stored on the argument stack in the interpreter. During this evaluation the result is flatten.
      • This is the argument list, the input to the subroutine. This list contains references to the argument values.
      • The argument values can be accessed from the subroutine definition by using @_ and $_n. @_ is the list of argument values. $_n is the value of argument n.

      I have several times reread the parts of the documentation which cover the subroutine call. Have still problems to understand everything. I do not understand it so well that I can say there are contradictions

      Probably I am still thinking that the text describes what you see in the source and not the result after evaluation.

      The word "list" is often used. Often it is not clear to which list it refers. The word list is also part of the term "list value". Is "list values" many "list value" or is it the values in a list?

      Interface to split

      How or where can you find the restrictions on the arguments to split?

      This split $_[0], $_[1], $_[2]; and this split $_[0], @_[1,2]; should have given the same result ("a", "b", "c")!

      use strict; use warnings; use 5.010; use Path::Tiny qw( path ); use Data::Dump qw(dump dd ddx); use B::Concise qw(set_style add_callback walk_output); sub concise { my $case = shift; my $fh; say ''; my $walker = B::Concise::compile( '-src', '-basic', $case ); $fh = path( 'split_call_' . $case )->openw_utf8; #walk_output($fh); $walker->(); say ''; $walker = B::Concise::compile( '-src', '-exec', $case ); $walker->(); } my $pat = ':'; my $str = 'a:b:c'; my $limit = -1; my @par = ( $pat, $str, $limit ); if (1) { sub splitC { my @rv = split $_[0], $_[1], $_[2]; return \@rv; } ddx splitC(@par); concise('splitC'); } if (1) { sub splitC1 { my @rv = split $_[0], @_[1,2]; return \@rv; } ddx splitC1(@par); concise('splitC1'); } __DATA__ Part of output: # split_call.pl:34: ["a", "b", "c"] main::splitC: 8 <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->8 # 30: my @rv = split $_[0], $_[1], $_[2]; 1 <;> nextstate(main 3 split_call.pl:30) v:*,&,{,x*,x&,x$,$,fea +=1 ->2 4 </> split(/":"/ => @rv:3,4)[t5] vK/LVINTRO,ASSIGN,LEX ->5 3 <|> regcomp(other->4) sK ->9 - <1> ex-aelem sK/2 ->3 - <1> ex-rv2av sKR/STRICT,1 ->- 2 <#> aelemfast[*_] s ->3 - <0> ex-const s ->- - <1> ex-aelem sK/2 ->a - <1> ex-rv2av sKR/STRICT,1 ->- 9 <#> aelemfast[*_] s/key=1 ->a - <0> ex-const s ->- - <1> ex-aelem sK/2 ->4 - <1> ex-rv2av sKR/STRICT,1 ->- a <#> aelemfast[*_] s/key=2 ->4 - <0> ex-const s ->- # 31: return \@rv; 5 <;> nextstate(main 4 split_call.pl:31) v:*,&,{,x*,x&,x$,$,fea +=1 ->6 - <@> return K ->- - <0> pushmark s ->6 7 <1> srefgen sK/1 ->8 - <1> ex-list lKRM ->7 6 <0> padav[@rv:3,4] lRM ->7 # split_call.pl:45: [-1] main::splitC1: i <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->i # 41: my @rv = split $_[0], @_[1,2]; b <;> nextstate(main 10 split_call.pl:41) v:*,&,{,x*,x&,x$,$,fe +a=1 ->c e </> split(/":"/ => @rv:10,11)[t5] vK/LVINTRO,ASSIGN,LEX,IMPLI +M ->f d <|> regcomp(other->e) sK ->j - <1> ex-aelem sK/2 ->d - <1> ex-rv2av sKR/STRICT,1 ->- c <#> aelemfast[*_] s ->d - <0> ex-const s ->- o <@> aslice sK ->p j <0> pushmark s ->k - <1> ex-list lK ->m - <0> ex-pushmark s ->k k <$> const[IV 1] s ->l l <$> const[IV 2] s ->m n <1> rv2av[t4] sKR/STRICT,1 ->o m <#> gv[*_] s ->n p <$> const[IV 0] s ->e # 42: return \@rv; f <;> nextstate(main 11 split_call.pl:42) v:*,&,{,x*,x&,x$,$,fe +a=1 ->g - <@> return K ->- - <0> pushmark s ->g h <1> srefgen sK/1 ->i - <1> ex-list lKRM ->h g <0> padav[@rv:10,11] lRM ->h
        The word "list" is often used. Often it is not clear to which list it refers.

        I don't normally think about lists in a way as complicated as you are here, but for initial learning that's probably fine (Update: to clarify: it's fine as a learning method). Personally, I think about "lists" in Perl as there being only one kind of list, and it's a somewhat loose term that can refer to argument lists (which the subroutine ends up seeing as an array), list value constructors, and return values. AFAICT, your description in your node seems to fit "lists" in general pretty well, so unfortunately I'm not quite sure what your specific question is here?

        sub foo { my @x = ("a", @_, "b"); # interpolation return "x", @x, "y"; # interpolation } my @y = ("i", "j"); my @z = foo("r", @y, "s"); # interpolation # @z is ("x", "a", "r", "i", "j", "s", "b", "y") # also: comma operator in scalar context via "return" my $x = foo("u", "v"); # $x is "y" !!
        This list contains references to the argument values.

        Just to nitpick this, note that these are not "references" in the sense of hard references described in perlref. They are more commonly referred to as aliases.

        How or where can you find the restrictions on the arguments to split?

        There is a huge caveat to the "arguments to subs are flattened": Prototypes, which I suggest you read up on. These change the way the function call is parsed, and this can include forcing arguments that would normally be flattened / interpolated, like in your case @_[1,2], to in fact be taken as if they have an implicit scalar on them.

        Builtin functions can indeed sometimes be a little confusing in this respect, because they are parsed as with prototypes, even if the prototypes are never stated explicitly for many functions. In fact, the use of prototypes is otherwise generally discouraged because of their often confusing effects on how function calls are parsed, but in the Perl core I believe they have historic significance. The $ prototype is probably the closest equivalent to what's going on with split:

        sub mysplit1 { ... } sub mysplit2 ($$;$) { ... } my @x = ("a:b:c", -1); mysplit1(":", @x); # arguments to mysplit1 are flattened mysplit2(":", @x); # parsed like mysplit2(":", scalar(@x)) !!! &mysplit2(":", @x); # prototype is ignored, arguments are flattened!

        Minor typo fixes.

        Sorry, IMHO are you over-complicating.

        Your definition of LIST is too narrow , it's not only used for function(LIST) and "comma" is not the only list constructor.

        You can find LIST in docs for other cases too.

        For me LIST means a piece of code which ...

        • is compiled in list context
        • returns a list value
        nothing more.

        for instance map {BLOCK} LIST ...

        • map { uc } qw/a b c/
        • map { uc } "a" .. "c"
        • map { uc } grep {...} ...
        • map { uc } "a", "b", "c"
        • map { uc } @a
        only the last two list constructors allow "list interpolation/flattening", since it's a feature of the "comma operator" which has two variants , and '=>'.

        correction: since it's a feature of the naked "list context" w/o operator, what comma does is just propagating the list context down the tree, hence @a=@b,@c is just @a = (@b),(@c)

        FWIW: I use "interpolation" primarily for vars in strings like in print "$a $b";

        But glossary lists both variants

        • interpolation

          The insertion of a scalar or list value somewhere in the middle of another value, such that it appears to have been there all along. In Perl, variable interpolation happens in double-quoted strings and patterns, and list interpolation occurs when constructing the list of values to pass to a list operator or other such construct that takes a LIST.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

List's terminology and Parentheses in Perl
by LanX (Saint) on Jul 18, 2020 at 12:03 UTC
    You have cited perldocs, unfortunately without linking to the references.

    List Terminology

    The term "list value" was confusing me, in such cases I recommend looking into perlglossary#LIST ff.

    NB: Perlglossary is from an appendix from the Camel (IIRC), while the perldocs were written by different authors.

    • LIST

      A syntactic construct representing a comma- separated list of expressions, evaluated to produce a list value. Each expression in a LIST is evaluated in list context and interpolated into the list value.

    • list

      An ordered set of scalar values.

    • list context

      The situation in which an expression is expected by its surroundings (the code calling it) to return a list of values rather than a single value. Functions that want a LIST of arguments tell those arguments that they should produce a list value. See also context.

    • list operator

      An operator that does something with a list of values, such as join or grep. Usually used for named built-in operators (such as print, unlink, and system) that do not require parentheses around their argument list.

    • list value

      An unnamed list of temporary scalar values that may be passed around within a program from any list-generating function to any function or construct that provides a list context.

    Parentheses

    I think one source of confusion (besides @arrays) is the syntactic meaning of parentheses ...(...)... in different constructs.

    • ()

      is an empty list

    • bareword(...)

      expects a LIST of arguments to a sub bareword °

    • (...)[subscript/slice]

      picks elements from a list

    • (...) =

      is a list assignment enforcing list context on the RHS updated

    ... those constructs impose a list context inside

    But ...

    • "naked" (...) doesn't

      those parens are just for precedence

    ... neither do they impose their own context nor do they create a "list value".

    Examples (updated)

    please see Examples for "LIST", "list context", "list value", "list assigment", "list operator"

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) well modulo sub-prototype