Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Splitting in while loop

by tel2 (Pilgrim)
on Oct 07, 2021 at 04:19 UTC ( [id://11137286]=perlquestion: print w/replies, xml ) Need Help??

tel2 has asked for the wisdom of the Perl Monks concerning the following question:

Beloved Monks,

Any ideas why my code below doesn't split the email addresses into lines of 1 address per line, each in single quotes?

while (split(/[, ]+/, <DATA>)) { chomp; print "'$_'\n"; } __DATA__ me@here.com those@there.com others@there.com you@there.com,them@there.com
I'm getting this output:
'' '' ''
But I was wanting/expecting this output:
'me@here.com' 'you@there.com' 'them@there.com' 'those@there.com' 'others@there.com'
How can it be changed to generate my expected output?

If someone has a completely different approach, I'm open to it, but I'd like to understand what I'm doing wrong, too, please.

Note: The real code will have the email addresses in a separate file, but I've put them under __DATA__ here for convenience.

Thanks.
Tel2

Replies are listed 'Best First'.
Re: Splitting in while loop
by haukex (Archbishop) on Oct 07, 2021 at 09:03 UTC

    The usual "don't use a regex when a real parser exists" applies here too. Single quotes are technically a valid character in email addresses, as are commas and spaces when quoted. The following uses Email::Address to correctly parse and split such (admittedly very unusual and not recommended) addresses.

    use warnings; use strict; use Email::Address; while (<DATA>) { for my $addr (Email::Address->parse($_)) { print $addr->address, "\n"; } } __DATA__ 'me'@here.com, "West, Casey" <casey@localhost> "those,foo"@there.com others@there.com you@there.com,them@there.com "Hello, World"@example.com

    Outputs:

    'me'@here.com casey@localhost "those,foo"@there.com others@there.com you@there.com them@there.com "Hello, World"@example.com
      Single quotes are technically a valid character in email addresses

      For those of you who think this is just pedantry I can assure you that it is not.

      Many years ago, when defensive programming was less widespread than now, a customer of $WORK did indeed have such an email address. It followed the pattern of jack.o'malley@bigcorp.com and caused no end of trouble for various systems which poor, hapless Jack was required to use. It even showed up one or two areas of $WORK's systems where such a potential injection was either mismanaged or misreported. These days Jack should have no trouble with systems from respsonsible coders but there are still plenty of slapdash operators out there who will struggle with this, even now.

      If you persist in using home-grown regexen to parse email addresses (or HTML or XML or SQL or ...) then you should be aware that sooner or later it will come back to bite you.


      🦛

        ... email address. It followed the pattern of jack.o'malley@bigcorp.com ...

        Hmm, let me guess: The first name was Robert. ;-)

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Points taken thanks guys, but in my case I'm just using them as a quick hack to visually show the limits of the address in the output, e.g. there is no leading/trailing spaces, etc. In the rare event that I did end up with this kind of thing being output:
        'jack.o'malley@bigcorp.com'
        it wouldn't matter.
Re: Splitting in while loop
by tybalt89 (Monsignor) on Oct 07, 2021 at 05:00 UTC

    Or:

    #!/usr/bin/perl use strict; use warnings; while( <DATA> ) { for ( split /[, \n]+/ ) { print "'$_'\n"; } }
Re: Splitting in while loop
by tybalt89 (Monsignor) on Oct 07, 2021 at 04:53 UTC
    #!/usr/bin/perl use strict; use warnings; while (split(/[, ]+/, <DATA>)) { chomp; print "'$_'\n"; } __DATA__ me@here.com those@there.com others@there.com you@there.com,them@there.com

    Outputs:

    Use of uninitialized value $_ in scalar chomp at ./pm11137286.pl line +8, <DATA> line 1. Use of uninitialized value $_ in concatenation (.) or string at ./pm11 +137286.pl line 9, <DATA> line 1. '' Use of uninitialized value $_ in scalar chomp at ./pm11137286.pl line +8, <DATA> line 2. Use of uninitialized value $_ in concatenation (.) or string at ./pm11 +137286.pl line 9, <DATA> line 2. '' Use of uninitialized value $_ in scalar chomp at ./pm11137286.pl line +8, <DATA> line 3. Use of uninitialized value $_ in concatenation (.) or string at ./pm11 +137286.pl line 9, <DATA> line 3. '' Use of uninitialized value in split at ./pm11137286.pl line 9, <DATA> +line 3.

    You probably want:

    #!/usr/bin/perl use strict; use warnings; for (map { split /[, ]+/ } <DATA>) { chomp; print "'$_'\n"; } __DATA__ me@here.com those@there.com others@there.com you@there.com,them@there.com
      Nice work, thanks for both your solutions, tybalt89!

      Do you understand why mine wasn't working?

        G'day tel2,

        "... why mine wasn't working?"

        I believe you're working on the erroneous assumption that split(/[, ]+/, <DATA>) will assign its result to $_. That assignment to $_ only occurs in a select number of cases. From "perlsyn: Compound Statements":

        "If the condition expression of a while statement is based on any of a group of iterative expression types then it gets some magic treatment. The affected iterative expression types are readline, the <FILEHANDLE> input operator, ... If the condition expression is one of these expression types, then the value yielded by the iterative operator will be implicitly assigned to $_."

        For each of your while iterations, the condition is true three times, once for each DATA line; however, $_ is not set for any of those iterations. On the fourth iteration, there are no more DATA lines and the loop ends.

        Always add

        use strict; use warnings;

        to the top of your code. In this instance, although it wouldn't give a detailed explanation of the problem as I've provided, it would have hinted at a starting point for investigating the issue.

        — Ken

        Further to jwkrahn's post:   tel2: Note that implicit split to @_ has been deprecated for a long time:

        Win8 Strawberry 5.8.9.5 (32) Thu 10/07/2021 3:29:50 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings my $data = <<'EOD'; me@here.com those@there.com others@there.com you@there.com,them@there.com EOD open my $fh, '<', \$data or die "opening: $!"; while (split(/[, ]+/, <$fh>)) { chomp @_; print "'$_'\n" for @_; } ^Z Use of implicit split to @_ is deprecated at - line 14. 'me@here.com' 'those@there.com' 'others@there.com' 'you@there.com' 'them@there.com'


        Give a man a fish:  <%-{-{-{-<

        while (split(/[, ]+/, <DATA>)) { ... }

        Only a while-loop condition expression like

        while (<DATA>) { ... }
        assigns implicitly to $_ (see tybalt89's example) (update: but see kcott's post for a more complete discussion of $_ assignment special-casing). An arbitrary expression like split(...) does not. Had you had warnings enabled, Perl would have at least hinted at this problem.


        Give a man a fish:  <%-{-{-{-<

        while (split(/[, ]+/, <DATA>))

        Is interpreted by perl as:

        while ( @_ = split( /[, ]+/, <DATA> ) )
Re: Splitting in while loop
by LanX (Saint) on Oct 07, 2021 at 13:16 UTC
    Summary

    1. RULE: while doesn't assign any variables (like foreach ) it's a boolean check alike if(CONDITION) { DO LOOP }
    2. Some operations assign automatically to some vars (side-effect) and return true on success. E.g. regexes setting $1 etc can be combined with a boolean check.
    3. Some like <FH> do this only inside while(<FH>) context for exceptinal DWIM

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      In non-slurp mode, this works via assignment using the diamond operator (<>) - or just the same if you open an file handle, $FH:
      # test.pl while (my $line = <>) { chomp $line; print qq{$line\n}; # yes I know, pointless use of chomp }
      > perl test.pl < test.pl # test.pl while (my $line = <>) { chomp $line; print qq{$line\n}; # yes I know, pointless use of chomp }
      Similarly, you could do something destructive with pop or shift, like,
      while (my $item = pop @my_array) { # do stuff with $item }
        while (my $line = <>) ... Similarly, ... while (my $item = pop @my_array)
        use warnings; use strict; while (my $line = <DATA>) { chomp $line; print "<$line>\n"; } my @array = ("Foo","0","Bar"); while (my $item = pop @array) { print "[$item]\n"; } __DATA__ Hello 0 World

        Outputs:

        <Hello> <0> <> <World> [Bar]

        I'd call that a pretty serious caveat. As per I/O Operators, the first is equivalent to while ( defined( my $line = <> ) ), and readline returns undef at EOF, while the second one will stop at any false value, and arrays can contain undefs too.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11137286]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2024-04-24 17:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found