Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Extract delimited words from string

by Anonymous Monk
on Dec 06, 2022 at 09:44 UTC ( [id://11148600]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
how can I parse strings like this below to extract the words delimited by quotation marks?
50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" 0 70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B12 + A255" 0
Thanks for your help!

Replies are listed 'Best First'.
Re: Extract delimited words from string
by hippo (Bishop) on Dec 06, 2022 at 10:47 UTC
    perl -F'"' -lE 'say $F[$_ * 2 + 1] for 0 .. $#F/2'

    works for all the test cases. Amend as required.


    🦛

      Thanks, how this would translate in a Perl program? Like in a loop on each line of the file:
      while (@lines) { my $line = shift @lines; my @string_pieces = ... instruction here }

        If you don't fancy reading the excellent perlrun to find out what the flags do, then just use B::Deparse to find out:

        $ perl -MO=Deparse -F'"' -lE 'say $F[$_ * 2 + 1] for 0 .. $#F/2' BEGIN { $/ = "\n"; $\ = "\n"; } use feature 'current_sub', 'bitwise', 'evalbytes', 'fc', 'postderef_qq +', 'say', 'state', 'switch', 'unicode_strings', 'unicode_eval'; LINE: while (defined($_ = readline ARGV)) { chomp $_; our @F = split(/"/u, $_, 0); say $F[$_ * 2 + 1] foreach (0 .. $#F / 2); } -e syntax OK

        (Updated to match the original code)


        🦛

        A dedicated module is better, but if you're shy of that, another approach:

        c:\@Work\Perl\monks>perl use strict; use warnings; my @lines = ( '50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255 +" 0', '70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 \"G12\" B12 A255" "R12 +G12 B12 A255" 0', ); my $rx_dq = qr{ " [^"\\]* (?: \\. [^"\\]*)* " }xms; for my $line (@lines) { my @d_quoted = $line =~ m{ $rx_dq }xmsg; print "@d_quoted \n"; } ^Z "R0 G255 B0 A255" "Solid" "R0 G0 B0 A255" "R0 G0 B0 A255" "R0 G255 B255 A255" "Solid" "R12 \"G12\" B12 A255" "R12 G12 B12 A255"


        Give a man a fish:  <%-{-{-{-<

Re: Extract delimited words from string
by LanX (Saint) on Dec 06, 2022 at 13:36 UTC
    you should better use a dedicated module for that, but just for fun

    use v5.12; use warnings; while ( <DATA> ) { chomp; say "'$_'"; say $3 while # print 3rd capture $_ =~ /(^|\s) # starts with line's start or white +space ("?) # capture optional double-quote ins +ide (otherwise empty string) ([^\2]*?) # capture anything non-quote \2 # end with same quote (?=(\s|$)) # look-ahead at following whitespac +e or line's end /xg # g= loop _globally_ over all match +es x=allow multiline extended regex } __DATA__ 50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" 0 70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B12 + A255" 0

    output:

    '50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" +0' 50 0 R0 G255 B0 A255 Solid 118 1 R0 G0 B0 A255 R0 G0 B0 A255 0 '70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B1 +2 A255" 0' 70 0 R0 G255 B255 A255 Solid 118 1 R12 G12 B12 A255 R12 G12 B12 A255 0

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

    update

    added comments

Re: Extract delimited words from string
by Discipulus (Canon) on Dec 06, 2022 at 10:01 UTC
    Hello, if this is a real example data you can use regex. What did you tried and searched for so far?

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      I thought that Text::Balanced module could help, but I got lost in the documentation...

        Yes, Text::Balanced will work (and it is a core module since 5.7.3).

        Code:

        #!/usr/bin/env perl use strict; use warnings; use Text::Balanced qw/ extract_multiple extract_quotelike /; my @data = ( q{50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" + 0}, q{70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B +12 A255" 0}, ); my @extracted; foreach my $str (@data) { @extracted = extract_multiple( $str, [ \&extract_quotelike, ], ); print q{Input: }, $str, qq{\n}; print q{Output: }, qq{\n}; print qq{\t}; print join qq{\n\t}, grep { !(m/^\s*$/) and length $_ > 0 } @extracted; print qq{\n}; }

        Output:

        Input: 50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 + A255" 0 Output: 50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" 0 Input: 70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 + G12 B12 A255" 0 Output: 70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B12 A255" 0

        Hope that helps.

Re: Extract delimited words from string
by tybalt89 (Monsignor) on Dec 08, 2022 at 22:21 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11148600 use warnings; my @lines = split /\n/, <<END; 50 0 "R0 G255 B0 A255" "Solid" 118 1 "R0 G0 B0 A255" "R0 G0 B0 A255" 0 70 0 "R0 G255 B255 A255" "Solid" 118 1 "R12 G12 B12 A255" "R12 G12 B12 + A255" 0 END my @strings = map /"(.*?)"/g, @lines; print "$_\n" for @strings;

    Outputs:

    R0 G255 B0 A255 Solid R0 G0 B0 A255 R0 G0 B0 A255 R0 G255 B255 A255 Solid R12 G12 B12 A255 R12 G12 B12 A255
Re: Extract delimited words from string
by Anonymous Monk on Dec 06, 2022 at 14:20 UTC
    Thank you both, quite complex stuff for me!
      > quite complex stuff for me!

      see Re: Extract delimited words from string again, I added more comments to better explain the extended regex.

      update

      There is a difference between Hippo's solution and mine, he literally extracts only "words delimited by quotation marks" like requested:

      > how can I parse strings like this below to extract the words delimited by quotation marks?

      my more complex solution is assuming you actually wanted to split on whitespace first and the quotes are optional.

      Hard to tell what you really wanted...

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11148600]
Approved by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-25 21:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found