Most honourable brothers and sisters in Perl. At work, we are having issues with
substr. It returns strange values when called on the special variable
$1.
The following code creates the test script and runs it:
use strict;
use warnings;
open my $PL, '>', 'utf2.pl' or die $!;
print {$PL} << '__PL__';
##########################################################
use strict;
use warnings;
binmode STDOUT, ':utf8';
binmode STDIN, ':utf8';
while (my $line = <>) {
if (my ($word) = $line =~ /^(.+)$/) {
my $one = substr($1, 0, 1); # doesn't work
my $w_one = substr($word, 0, 1); # works
print "'$one' = '$w_one'\tat $line" unless $one eq $w_one;
}
}
##########################################################
__PL__
open my $OUT1, '>', 'utf1' or die $!;
print {$OUT1} map chr hex, qw/61 61 c5 99 0a c4 8d 0a 61 61 c5 99 0a/;
close $OUT1;
open my $OUT2, '>', 'utf2' or die $!;
print {$OUT2} map chr hex, qw/c4 8d 0a 61 61 c5 99 0a c4 8d 0a/;
close $OUT2;
system "$^X utf2.pl < utf1";
print "\n";
system "$^X utf2.pl < utf2";
The output is (tested in blead 5.17.3, on x86_64-linux-thread-multi):
'�' = 'č' at č
'aař' = 'a' at aař
'č�' = 'č' at č
Do you have any explanation? Should I submit a bugreport?
Update: Thanks all, bugreport sent.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.