"be consistent" | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
First, thanks A LOT for the insightful answers!
Both of you are talking about filehandles, I wish I were using some... In fact I am wrangling here with the output of various modules, in this case the LWP lib. "decode it properly to get 'perl's internal format' " means I use the $mess->decoded_content() function of HTTP::Message about which the doc says: "Returns the content with any Content-Encoding undone and strings mapped to perl's Unicode strings." For me this means: "internal format", which is in fact a utf8-encoding-dialect (but I should forget about that anyway..) The utf8 flag of the modules output is ON - and this is where the confusion happens: I thought that the utf8 flag is set although the data is really unicode octets.. But know I understand it as: decoded_content() returns utf8 encoded unicode (step 1), my perl script and its regexes should handle utf8 encoded unicode (step 2) - so everything is fine. And output should also be utf8 encoded unicode. Which it already is so I modified the step to skip the wrong encode step (new step 3) - am I doing it right now? For the interested reader: in fact I use storable to serialize my resulting data structure as whole, then I gzip the freeze'd data and write it to disk with a simple binmode (and thus not :utf8) filehandle. Any problems here? utf8 data and utf8-flag should stay intact over the pipeline. In reply to Re: The unicode / utf8 struggle, part 2: regexes
by isync
|
|