Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

utf-8 keys in a tied hash cause warning

by saintmike (Vicar)
on Aug 03, 2006 at 21:10 UTC ( [id://565560]=perlquestion: print w/replies, xml ) Need Help??

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

I'm puzzled that hashes tied via dbmopen apparently don't like utf-8 encoded keys:
my $utf8key = "\x{05D0}"; dbmopen(my %hash, "/tmp/mydb", 0666) || die "d'oh!"; $hash{$utf8key} = "bar"; dbmclose(%hash);
prints
    Wide character in null operation at ./test.pl line 8.
As checked with Encode::is_utf8, the string in $utf8key has the utf8 flag on.

Is this a bug in the dbm implementation or am I just confused?

It happens with perl5.8.8 and perl5.9.3. Thanks for any help.

Replies are listed 'Best First'.
Re: utf-8 keys in a tied hash cause warning
by ikegami (Patriarch) on Aug 03, 2006 at 22:22 UTC

    It's not the fault of tied hashes.

    use Tie::Hash qw( ); our @ISA = 'Tie::StdHash'; sub STORE { my ($self, $key, $val) = @_; print($key eq "\x{05D0}" ? "utf" : "not utf", "\n"); return $self->SUPER::STORE($key, $val); } my %h; tie %h, __PACKAGE__; $h{"\x{05D0}"} = 1; # Prints 'utf' in 5.8.6

    dbm probably doesn't support unicode keys. The workaround is to encode your strings of chars into strings of bytes. UTF-8 is probably the best suited encoding.

Re: utf-8 keys in a tied hash cause warning
by kwaping (Priest) on Aug 03, 2006 at 22:04 UTC
    Here's what use diagnostics has to say about that:

    (W utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.

    Hope that helps!

    ---
    It's all fine and dandy until someone has to look at the code.
Re: utf-8 keys in a tied hash cause warning
by graff (Chancellor) on Aug 04, 2006 at 02:37 UTC
    Following up on the earlier replies, if I supplement the OP code like so:
    use Encode; my $utf8key = "\x{05D0}"; my $usable_key = encode( 'utf8', $utf8key ); dbmopen(my %hash, "/tmp/mydb", 0666) || die "d'oh!"; $hash{$usable_key} = "bar"; dbmclose(%hash);
    I don't get the warning message. I also noticed some differences in the content of the resulting dbm file -- the OP version had null bytes where the 'encoded' version had non-null bytes, suggesting that the warning issued by the OP version reflects an actual failure to store the data.

    Having to encode the hash keys like this is certainly a PITA (a minor one, but still). Perhaps the maintainer(s) the various *DBM_File modules can be persuaded to update them so as to handle this properly -- easy enough to do, I'd expect.

      I don't get the warning message

      It's not a warning. It's a fatal error. It was added to newer versions of Perl.

      I also noticed some differences in the content of the resulting dbm file

      Not here. 5.8.0 with Encode, 5.8.0 without Encode and 5.8.8 with Encode output:

      0000: 02 00 FE 03 FB 03 00 00 00 00 00 00 00 00 00 00 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 03D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03F0: 00 00 00 00 00 00 00 00 00 00 00 62 61 72 D7 90

      Since strings of chars are stored internally as UTF-8, the resulting file is indentical.

      dbmopen is obsolete.
        It doesn't matter if you use dbmopen or tie in this scenario, the problem seems to lie in the storage engine(s) used by these functions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://565560]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-29 13:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found