Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Module to read - modify - write text files in any unicode encoding

by ikegami (Patriarch)
on May 19, 2008 at 22:21 UTC ( #687485=note: print w/replies, xml ) Need Help??

in reply to Module to read - modify - write text files in any unicode encoding

You use :raw:encoding($enc):crlf:utf8 for writing.
You use :encoding($enc) for reading.
This lack in symmetry in your IO layers accounts for your lack of symmetry in CRLF handling.

Use :raw:encoding($enc):crlf:utf8 for reading too.

use Data::Dumper qw( Dumper ); sub hexdump { (my $dump = uc unpack 'H*', $_[0]) =~ s/(..)/$1 /g; return $dump; } sub txtdump { local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; return Dumper($_[0]); } { open(my $fh, '>:raw:encoding(UTF-16le):crlf:utf8', 'test') or die; my $data = "foo\nbar\n"; print("Orig: ", txtdump($data), "\n"); print $fh $data; } { open(my $fh, '<:raw', 'test') or die; local $/; my $data = <$fh>; print("File: ", hexdump($data), "\n"); } { open(my $fh, '<:raw:encoding(UTF-16le):crlf:utf8', 'test') or die; local $/; my $data = <$fh>; print("Read: ", txtdump($data), "\n"); }
Orig: "foo\nbar\n" File: 66 00 6F 00 6F 00 0D 00 0A 00 62 00 61 00 72 00 0D 00 0A 00 Read: "foo\nbar\n"

Update: Added code.

Replies are listed 'Best First'.
Re^2: Module to read - modify - write text files in any unicode encoding
by Rudif (Hermit) on May 20, 2008 at 22:22 UTC

    Your code just works, also when I apply it to UTF-8.

    Apart from the lack in symmetry in my IO layers, that you pointed out, I found another source of my confusion, which you probably noticed, but you did not comment on :

    my_hexdump() based on Data::Hexdump that I was using in tests is wrong - on Windows.
    Deep inside, Data::Hexdump reads the file without applying '<:raw', like you do. So, when reading the UTF-8 or plain ASCII sequence "\r\n", it converts it to "\n".

    In addition, I was using to dump my test files. It agreed with my_hexdump(), but they were both wrong!.

    Here is a correct file hexdump, based on your code :

    sub hexdump { my $file = shift; open(my $fh, '<:raw', $file) or die; local $/; my $data = <$fh>; (my $dump = uc unpack 'H*', $data) =~ s/(..)/$1 /g; return $dump; }
    Thank you for the insight.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://687485]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2023-12-10 20:31 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (41 votes). Check out past polls.