Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: utf8 "\xB7" does not map to Unicode at /usr/local/bin/бибс/об‰ line 112.

by graff (Chancellor)
on Nov 03, 2015 at 11:14 UTC ( [id://1146803]=note: print w/replies, xml ) Need Help??


in reply to utf8 "\xB7" does not map to Unicode at /usr/local/bin/бибс/об‰ line 112.

You haven't given us enough information. Do you have files with non-ASCII files names (in /usr/local/bin/) ? If so, are you sure about what character encoding is being used for those file names?

I'm guessing you do have non-ASCII file names, they are utf8-encoded, and you probably don't have these lines near the top of the script:

binmode STDOUT, ":utf8"; binmode STDERR, ":utf8";
and/or maybe you don't have this:
use Encode;
which would let you do something like this:
opendir( my $dir, "/usr/local/bin" ) or die "Can't read /usr/local/bin +: $!\n"; while ( my $fname = decode( "utf8", readdir( $dir ))) { print $fname, "\n"; }
That snippet, when used with the other lines above, will show you the file names found in your /usr/local/bin/. If you'd rather use the output of the "find" utility, it might go like this:
#!/usr/bin/perl use strict; use warnings; use Encode; binmode STDOUT, ":utf8"; binmode STDERR, ":utf8"; open(my $find,"|-:utf8","find /usr/local/bin -type f") or die "Can't r +un find: $!\n"; while ( <$find> ) { print; }
Note that the first example (using opendir/readdir) prints just the names of files in that one directory, and the second example (with "find") prints the absolute path names for all files in that directory and in all its subdirectories. (Update: and notice that "\n" has to be added in the first, but is already included in the file name string in the second.)

(Also, if all your file names are plain ASCII, the above scripts still work, because ASCII is a subset of utf8.)

Now, if some of your file names have non-ASCII characters, and use some character encoding other than utf8 (e.g. koi8-r or iso-8859-5 or cp1251 or whatever), you have to figure what that encoding is, and use it in place of "utf8" when you call decode() or open( ..., "|-...", "find ...").

If some of your file names have been corrupted (e.g. they were utf8-encoded but somehow got "renamed" with a bad byte sequence), you'll need to fix that.

(Update: I believe it is possible that a single directory can contain some file names that use one encoding, and other file names that use a different encoding. You might want to look closely at the man page for Encode, especially the part about catching errors ("FB_CROAK"), and you may also want to look at Encode::Guess.)

Replies are listed 'Best First'.
Re^2: utf8 "\xB7" does not map to Unicode at /usr/local/bin/бибс/об&#137; line 112.
by nikolay (Beadle) on Nov 04, 2015 at 07:55 UTC

    I will gladly do!

    Yes, scripts live in Russian alphabet directory, that is in /usr/local/bin -- all in UTF8.

    I did place the binmode lines in the script (the one that actually gives me the error and the one from which it is called) but i still get error message as above (regarding the script path (not readable characters).

    I have tried open operator for find command. And can not understand how to get files it has found: how to put it into array. -- For

    open(my $find, "|-:utf8", "/usr/bin/find -type f >/dev/null") or die " +Can't run find: $!\n"; while( <$find> ){ push @array, $_; }

    Does not put anything into array.

    Thank you for your extended answer, graff.

      "find /usr/local/bin -type f >/dev/null"

      Of course this will produce no data because you are directing the output to /dev/null. What happens if you remove the output direction and instead just use find /usr/local/bin -type f ?

        I will answer by example.

        open( $vrm, "|-:utf8", "/usr/bin/find $put -$tip $kriteriy" ) or die " +Can't run find: $!\n"; while( <$vrm> ){ push @svitok, $_; } print "ZNAK\n"; print "\n>$svitok[0]<\n"; exit;

        Gives me this:

        ZNAK Use of uninitialized value $svitok[0] in concatenation (.) or string a +t /usr/local/src/исп/бибс/об&#137; line 122. ./svitok1 ./svitok2 ./svitok3 ><

        So:

        1. >/dev/null only outputs not to terminal -- array remains the same -- and i want to output nothing to terminal.

        2. Array is not filled.

        3. Strange, but ZNAK mark is outputted *before* output of find though in the code it stands (executed) *after* find. Why?

        I think of qx operator -- is there a way that PERL will continue the operator execution -- regardless the encodings present in its output?

        Thank you for participation very much!

      If you look at the code snippet I posted, you'll see that I did not redirect the output of "find" to /dev/null -- as pointed out in the other replies above, that's why your array remained empty.

      One other point: now that you're pushing entries into an array, you'll probably want to use chomp -- e.g. like this:

      open(my $find,"|-:utf8","find /usr/local/bin -type f") or die "find fa +iled: $!\n"; while( <$find> ){ chomp; push @array, $_; }
      or leave out the while loop and chomp the whole array, like this:
      open( my $find, ... ) # (same as above, without redirecting to "/dev/n +ull") my @array = <$find>; # reads all lines into array chomp @array; # strips linefeeds from all array elements
        Thank you, so i will do, but i have to fill array first.
      >/dev/null discards any output. What do you expect to obtain by reading it?
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        I tried both: w/ and w/o the redirection -- result for array was the same, while terminal wasn't filled w/ garbage (to me).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1146803]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2024-03-28 18:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found