Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

convert UTF-8 in nested data structures

by slayven (Pilgrim)
on Oct 31, 2005 at 14:45 UTC ( [id://504275]=perlquestion: print w/replies, xml ) Need Help??

slayven has asked for the wisdom of the Perl Monks concerning the following question:

I was looking for a module to convert not only a single string to UTF-8 but a nested data structure. Unfortunately I wasn't able to find anything appropriate so I had to write something for my own. The following is a works-for-me solution I came up with but I'm sure you'll find enough bugs and proposals for me to be ashamed of this procedure. So be gentle ;)
use Encode; sub digConvert { my $ref = shift; if (ref $ref eq 'HASH') { foreach (keys %$ref) { $ref->{$_} = digConvert($ref->{$_}); } } elsif (ref $ref eq 'ARRAY') { foreach my $i (0 .. $#{$ref}) { $ref->[$i] = digConvert($ref->[$i]); } } elsif (ref $ref eq '' && $ref) { # don't upset XML parser # problably more to come $ref =~ s/&/&/g; $ref = Encode::encode_utf8($ref); } else { ### something I missed? } return $ref; }


--
trust in bash
but tie your camel

Replies are listed 'Best First'.
Re: convert UTF-8 in nested data structures
by ikegami (Patriarch) on Oct 31, 2005 at 14:58 UTC

    elsif (ref $ref eq '' && $ref) {
    should probably be
    elsif (ref $ref eq '' && length $ref) {
    to cover '0' and 0.

    What about escaping < and >?

    I find it odd that you return the converted value, since you're converting in place.

    Also, your variable $ref is misnamed, since it's not necessarily a ref.

    I've also added support for scalar refs.

    use Encode; sub digConvert { our $val; local *val = \$_[0]; if (ref $val eq 'HASH') { foreach (keys %$val) { digConvert($val->{$_}); } } elsif (ref $val eq 'ARRAY') { foreach my $i (0 .. $#{$val}) { digConvert($val->[$i]); } } elsif (ref $val eq 'SCALAR') { digConvert($$val); } elsif (!ref $val && length $val) { # don't upset XML parser # problably more to come $val =~ s/&/&amp;/g; $val =~ s/</&lt;/g; $val =~ s/>/&gt;/g; $val = Encode::encode_utf8($val); } else { ### something I missed? } }
      thanks for the length. I really should form a habit of using it.
      What about escaping < and >?
      They didn't appear in my data yet but you're right, they're likely to be taken care of.

      And finally, you're absolutely right about the scalar parameters. Thanks again.


      --
      trust in bash
      but tie your camel
      Just to complete the sub to adapt on real world data.
      return unless defined $val;
      helps on undefined values.
      -- slayven
              trust in bash
              but tie your camel
      

Re: convert UTF-8 in nested data structures
by mirod (Canon) on Oct 31, 2005 at 15:50 UTC

    If you are willing to venture outside of CPAN ;--) I have a module that lets you walk an unknown data structure: Data::Traverse. It does basically what your code does, but wraps it to give you an iterator.

      Sounds like a really useful module. I don't think I'll use it for my unsignificant task here but it sure will be bookmarked. Thanks for the hint and of course for coding it in the first place.

      --
      trust in bash
      but tie your camel
Re: convert UTF-8 in nested data structures
by valdez (Monsignor) on Nov 08, 2005 at 11:11 UTC

    You could use Data::Structure::Util:

    use Data::Structure::Util qw/ utf8_on /; $ref = { whatever => 'אטילעש' }; utf8_on( $ref );

    Ciao, Valerio

      The Description of the Module says
      use Data::Structure::Util qw/ utf8_on /;
      This routine performs a sv_utf8_upgrade on each scalar string in the passed data structure that does not have the utf8 flag turned on.
      This is not exactly what I'm looking for. I need to convert the string regardless the flag it has internally.
      -- slayven
              trust in bash
              but tie your camel
      

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://504275]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2024-04-19 07:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found