I think this is a bug. But it is a documented bug.
From the documentation on perlunicode (emphasis added by me):
The following areas need further work. They are being rapidly addressed in the 5.7.x development branch.
...
Regular Expressions
The existing regular expression compiler does not produce polymorphic opcodes. This means that the determination on whether to match Unicode characters is made when the pattern is compiled, based on whether the pattern contains Unicode characters, and not when the matching happens at run time. This needs to be changed to adaptively match Unicode if the string to be matched is Unicode.
To see this, I put in a unicode character in a position guaranteed not to match:
#!/usr/bin/perl -l
$_ = chr(12345);
print "Length: ", length; # Length: 1
s/.|[^$_]//;
print "Length: ", length; # Length: 2
# prints:
# Length: 1
# Length: 0
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|