http://qs321.pair.com?node_id=327372


in reply to Re: Re: DBD::Mysql Messing with my data?
in thread DBD::Mysql Messing with my data?

Have you tried setting binmode on the file handle? What operating system are you working on?

90% of every Perl application is already written.
dragonchild
  • Comment on Re: Re: Re: DBD::Mysql Messing with my data?

Replies are listed 'Best First'.
Re: Re: Re: Re: DBD::Mysql Messing with my data?
by opuz (Initiate) on Feb 07, 2004 at 21:30 UTC
    OLD SYSTEM:
    
    Red Hat Linux release 7.0 (Guinness)
    LANG=en_US
    LC_CTYPE="en_US"
    LC_NUMERIC="en_US"
    LC_TIME="en_US"
    LC_COLLATE="en_US"
    LC_MONETARY="en_US"
    LC_MESSAGES="en_US"
    LC_PAPER="en_US"
    LC_NAME="en_US"
    LC_ADDRESS="en_US"
    LC_TELEPHONE="en_US"
    LC_MEASUREMENT="en_US"
    LC_IDENTIFICATION="en_US"
    LC_ALL=
    
    
    NEW SYSTEM:
    
    Red Hat Linux release 9 (Shrike)
    LANG=en_US.UTF-8
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=
    
    I'm not using binmode, I thought it was only needed for windows OS's.

      Binmode is needed for any file system that can mangle the data on the way to the file. That is becoming more and more an issue on *nix systems when using different character encodings (like utf). Perl 5.8 has better support for locales and different character encodings so it is becoming important to use binmode on all systems when necessary.

      90% of every Perl application is already written.
      dragonchild
      NEW SYSTEM: Red Hat Linux release 9 (Shrike) LANG=en_US.UTF-8
      Bingo. And I'll bet that if you do "perl -V" it will say "revision 5.0 version 8 subversion 0". Given this combination, the default behavior for output to a file is to assume the output data is supposed to be UTF-8 (to match the locale), with alterations applied where presumed necessary. And in the many cases where this is not the coder's real intention, the output gets messed up (sort of like the familiar "text-mode" vs. "binary-mode" output on microsoft systems). It would be instructive to see what is being added to the stream, to understand the process better.

      Put the "use bytes;" pragma in the relevant block of code (or at the top of the script), until you have a chance to upgrade to a 5.8.1 or later release, where the default behavior is the more appropriate "leave data as-is during output, unless there is explicit instruction to do otherwise."