use strict;
use warnings;
use feature 'say';
# use Regexp::Common;
# ^^^ Not used. I'm so lazy, I just peeked at $RE{quoted}
# to construct the "$quoted" expression below, by slightly
# modifying it (see "$") to satisfy the third clause.
# And actually 2nd test case below is to test how it works,
# it seems there's not a similar one among your 18.
my $quoted = qr/
(?:(?|
(?:(?<!\\)\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)|
(?:(?<!\\)\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$)
))
/x;
my $re = qr/(?:$quoted|[^ ])+\K(?: |$)/;
my @tests = (
q(This 'isn\'t nice.'),
q(This 'isn\'t nice.),
q(This \"isnt unnice.\"),
);
for my $t ( @tests ) {
say "[$_]" for split $re, $t;
}
__END__
[This]
['isn\'t nice.']
[This]
['isn\'t nice.]
[This]
[\"isnt]
[unnice.\"]
10 minutes update: aargh, added negative look-behind to cover your 14th case (and added my third). Maybe there are more to add. Further: it's more tricky, 6 (and 7) are split in 3, but wrong, groups. Will look into that later. False alarm? Will see yet later :)
Next morning update. As LanX pointed out, negative look-behind for just a single backslash isn't enough. Then to save this answer (I like how the "keep" \K meta-character helps in regexp for split, it's kind of interesting), maybe it's easier to revert $quoted to as it was borrowed from $RE{quoted}, and tweak the $re:
my $quoted = qr/
(?:(?|
(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)|
(?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$)
))
/x;
my $re = qr/
(?:
(?:\\\\)+
|
(?:\\[^ ])
|
$quoted
|
[^ ]
)+ \K
(?:
\ | $
)
/x;
I hope it works now, my 1st attempt at this "update" was broken (see, but better not -- nothing interesting -- below. Sorry for the mess.). But further, it's unclear whether to split on escaped space, or several spaces in a row.
my $quoted = qr/
(?:(?|
(?:
(?:[^\\\'\ ]*(?:\\[^\ ][^\\\'\ ]*)*)
\"
)
(?:
[^\\\"]*
(?:
\\
.
[^\\\"]*
)*
)
(?:\"|$)
|
(?:(?:[^\\\' ]*(?:\\[^ ][^\\\' ]*)*)\')(?:[^\\\']*(?:\\.[^\\\']*)*
+)(?:\'|$)
))
/x;
And later (final(?)) update: Sigh... damn lack of practice. So this:
my $quoted = qr/
(?:(?|
(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)
|
(?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$)
))
/x;
my $re = qr/
(?:
(?:\\.)+
|
$quoted
|
[^ \\"']+
)* \K
(?:
\ | $
)+
/x;
# and later:
my $got = [ split $re, $str ];
passes all tests in LanX's later answer except #2 and is somewhat optimized.
About test #2: consensus is "the brief is unclear", must split-like behaviour generate an empty leading field for #2? Expression to split on is definitely not missing nor space literal. If, nevertheless, it must not (as my solution does, failing #2), then my bad, but still, yeah, this regexp is "working" and can be used to literally split on. :)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.