http://qs321.pair.com?node_id=11148386


in reply to Re^6: getting rid of UTF-8
in thread getting rid of UTF-8

I must be doing something stupid. Here's my little test program:
#!/usr/bin/perl use v5.10 ; use strict; use warnings ; my $BOM = "\xef\xbb\xbf" ; die "no args\n" unless @ARGV == 2 ; open (my $i, "<", $ARGV[0]) or die "Can't open $ARGV[0]\n" ; open (my $o, ">", $ARGV[1]) or die "Can't write to $ARGV[1]\n" ; say "marker is" ; printhex ($BOM) ; say "" ; while (my $line = <$i>) { my $newline = $line ; printhex ($newline) ; $newline =~ s/$BOM//g; die "didn't change" if $newline eq $line ; print $o $newline ; } close $i ; close $o ; exit ; sub printhex { my $str = $_[0] ; for my $chr (split(//,$str)) { printf("%x ", ord($chr)) ; } }
and when I run it on one of teh BOM'ed files I get:
marker is ef bb bf didn't change at D:\Desktop\striputf.pl line 19, <$i> line 3. ef bb bf 49 6d 70 6f 72 74 61 6e 63 65 2c 46 69 72 73 74 20 4e 61 6d 6 +5 2c 4d 69 64 64 6c 65 20 4e 61 6d 65 2c 4c 61 73 74 20 4e 61 6d 65 2 +c 46 75 6c 6c 20 4e 61 6d 65 2c 43 6f 6d 70 61 6e 79 2c 44 65 70 61 7 +2 74 6d 65 6e 74 2c 4a 6f 62 20 54 69 74 6c 65 2c 53 74 72 65 65 74 2 +0 28 62 2e 29 2c 43 69 74 79 20 28 62 2e 29 2c 53 74 61 74 65 20 28 6 +2 2e 29 2c 5a 49 50 20 43 6f 64 65 20 28 62 2e 29 2c 43 6f 75 6e 74 7 +2 79 2f 52 65 67 69 6f 6e 20 28 62 2e 29 2c 48 6f 6d 65 20 50 68 6f 6 +e 65 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 2c 4d 6f 62 69 6c 6 +5 20 50 68 6f 6e 65 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 20 3 +2 2c 42 75 73 69 6e 65 73 73 20 50 68 6f 6e 65 20 33 2c 42 75 73 69 6 +e 65 73 73 20 50 68 6f 6e 65 20 34 2c 42 75 73 69 6e 65 73 73 20 46 6 +1 78 2c 42 75 73 69 6e 65 73 73 20 57 65 62 20 50 61 67 65 2c 53 74 7 +2 65 65 74 20 28 68 2e 29 2c 43 69 74 79 20 28 68 2e 29 2c 53 74 61 7 +4 65 20 28 68 2e 29 2c 5a 49 50 20 43 6f 64 65 20 28 68 2e 29 .....
What am I getting wrong/missing?