Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: solution wanted for break-on-spaces (w/specifics)

by AnomalousMonk (Archbishop)
on Oct 24, 2021 at 07:59 UTC ( [id://11137943]=note: print w/replies, xml ) Need Help??


in reply to solution wanted for break-on-spaces (w/specifics)

Building on kcott's approach (and his test cases and their underlying assumptions), here's a regex-based solution. I've added a few test cases of my own, but their validity is questionable because I don't fully understand perl-diddler's requirements. No attempt has been made to compare performance.

Win8 Strawberry 5.8.9.5 (32) Sun 10/24/2021 3:14:25 C:\@Work\Perl\monks >perl use strict; use warnings; use Test::More; use Test::NoWarnings; sub pp { local $" = '| |'; "|@{$_[0]}|"; } # for output pretty-print +ing my @tests = ( q{all '- and "-quotes properly balanced}, [ q{This is simple.}, [ q{This}, q{is}, q{simple.} + ] ], [ q{ This is simple. }, [ q{This}, q{is}, q{simple.} + ] ], [ q{This is "so very simple".}, [ q{This}, q{is}, q{"so very simple" +.} ] ], [ q{This "is so" very simple.}, [ q{This}, q{"is so"}, q{very}, q{si +mple.} ] ], [ q{This 'isn\'t nice.'}, [ q{This}, q{'isn\'t nice.'} + ] ], [ q{This "isn\"t nice."}, [ q{This}, q{"isn\"t nice."} + ] ], [ q{This 'isn\\\\'t nice.'}, [ q{This}, q{'isn\\\\'t}, q{nice.'} + ] ], [ q{This "isn\\\\"t nice."}, [ q{This}, q{"isn\\\\"t}, q{nice."} + ] ], [ q{This 'is not unnice.'}, [ q{This}, q{'is not unnice.'} + ] ], [ q{This "is not unnice."}, [ q{This}, q{"is not unnice."} + ] ], [ q{a "bb cc" d}, [ q{a}, q{"bb cc"}, q{d} + ] ], q{UNbalanced '- and "-quotes at absolute end of string}, [ q{This is "so very simple}, [ q{This}, q{is}, q{"so very simple} ] + ], [ q{This 'isn\'t nice.}, [ q{This}, q{'isn\'t nice.} ] + ], [ q{This "isn\"t nice.}, [ q{This}, q{"isn\"t nice.} ] + ], [ q{This 'isn\\\\'t nice.}, [ q{This}, q{'isn\\\\'t}, q{nice.} ] + ], [ q{This "isn\\\\"t nice.}, [ q{This}, q{"isn\\\\"t}, q{nice.} ] + ], [ q{This 'is not unnice.}, [ q{This}, q{'is not unnice.} ] + ], [ q{This "is not unnice.}, [ q{This}, q{"is not unnice.} ] + ], 'what about these questionable cases?', [ q{is this"really so"simple now?}, [ q{is}, q{this"really so"simple +}, q{now?} ] ], [ q{is this"really so" now?}, [ q{is}, q{this"really so"}, + q{now?} ] ], [ q{is "really so"simple now?}, [ q{is}, q{"really so"simple}, + q{now?} ] ], [ q{is this'really so'simple now?}, [ q{is}, q{this'really so'simple +}, q{now?} ] ], [ q{is this'really so' now?}, [ q{is}, q{this'really so'}, + q{now?} ] ], [ q{is 'really so'simple now?}, [ q{is}, q{'really so'simple}, + q{now?} ] ], ); my @additional = qw(Test::NoWarnings); # each of these adds 1 test plan 'tests' => (scalar grep { ref eq 'ARRAY' } @tests) + @additional ; # an escape \ escapes ANY character. my $rx_dq = qr{ " [^\\"]* (?: \\. [^\\"]*)* (?: " | \z) }xms; my $rx_sq = qr{ ' [^\\']* (?: \\. [^\\']*)* (?: ' | \z) }xms; my $rx_q = qr{ $rx_dq | $rx_sq }xms; # match quoted or non-space substrings. alt order critical! # my $rx_extract = qr{ $rx_q \S* | \S+ }xms; # for non-questionable c +ases my $rx_extract = qr{ [^'"\s]* $rx_q [^'"\s]* | \S+ }xms; VECTOR: for my $ar_vector (@tests) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } my ($string, $ar_expected) = @$ar_vector; my @got = $string =~ m{ $rx_extract }xmsg; is_deeply \@got, $ar_expected, "|$string| -> " . pp $ar_expected; } # end for VECTOR ^Z 1..25 # all '- and "-quotes properly balanced ok 1 - |This is simple.| -> |This| |is| |simple.| ok 2 - | This is simple. | -> |This| |is| |simple.| ok 3 - |This is "so very simple".| -> |This| |is| |"so very simple".| ok 4 - |This "is so" very simple.| -> |This| |"is so"| |very| |simple. +| ok 5 - |This 'isn\'t nice.'| -> |This| |'isn\'t nice.'| ok 6 - |This "isn\"t nice."| -> |This| |"isn\"t nice."| ok 7 - |This 'isn\\'t nice.'| -> |This| |'isn\\'t| |nice.'| ok 8 - |This "isn\\"t nice."| -> |This| |"isn\\"t| |nice."| ok 9 - |This 'is not unnice.'| -> |This| |'is not unnice.'| ok 10 - |This "is not unnice."| -> |This| |"is not unnice."| ok 11 - |a "bb cc" d| -> |a| |"bb cc"| |d| # UNbalanced '- and "-quotes at absolute end of string ok 12 - |This is "so very simple| -> |This| |is| |"so very simple| ok 13 - |This 'isn\'t nice.| -> |This| |'isn\'t nice.| ok 14 - |This "isn\"t nice.| -> |This| |"isn\"t nice.| ok 15 - |This 'isn\\'t nice.| -> |This| |'isn\\'t| |nice.| ok 16 - |This "isn\\"t nice.| -> |This| |"isn\\"t| |nice.| ok 17 - |This 'is not unnice.| -> |This| |'is not unnice.| ok 18 - |This "is not unnice.| -> |This| |"is not unnice.| # what about these questionable cases? ok 19 - |is this"really so"simple now?| -> |is| |this"really so"simple +| |now?| ok 20 - |is this"really so" now?| -> |is| |this"really so"| |now +?| ok 21 - |is "really so"simple now?| -> |is| |"really so"simple| |n +ow?| ok 22 - |is this'really so'simple now?| -> |is| |this'really so'simple +| |now?| ok 23 - |is this'really so' now?| -> |is| |this'really so'| |now +?| ok 24 - |is 'really so'simple now?| -> |is| |'really so'simple| |n +ow?| ok 25 - no warnings


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: solution wanted for break-on-spaces (w/specifics)
by perl-diddler (Chaplain) on Oct 26, 2021 at 16:36 UTC
    Re: "No attempt has been made to compare performance. " Absolutely! I thought about rolling that in, but the question was already long and complex. I totally agree that should be measured and factored in, however, a few saying around that -- "premature optimization is a bane". Similar to priorities on code development: 1) get something working, 2) then look at other issues (like perf, etc).

    Of the solutions I've seen, both seem like they wouldn't be too different as they use similar methodology. A multi-state parser might be faster than an RE, but maybe not if written in interpreted perl code. One might have to go to 'XS' to gain speed that way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137943]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-24 10:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found