Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Issue with reading a unicode file

by cstar (Initiate)
on Jan 07, 2013 at 05:04 UTC ( [id://1011948]=perlquestion: print w/replies, xml ) Need Help??

cstar has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have a unciode file with some chinese text having the titles of some winodws which i have to search for. When i read the file for window title, perl is getting unicode data different from input file Below 2 snippets explains my problem

Script 1:

use Encode; use Win32::GuiTest qw(FindWindowLike); open(MYFILE, '<:encoding(UTF-8)',"saml.txt") || die "cannot open: $!" +; open(OUTFILE,'>:encoding(UTF-8)',"out.txt") || die "cannot open: $!"; $WindowTitle=<MYFILE>; #reading the chinese window title from input f +ile chomp($WindowTitle); binmode(STDOUT, ":utf8"); print "$WindowTitle\n"; #===> Here perl prints out some chinese text +to the command console, but different from what is given in input fil +e my @hwnd=Win32::GuiTest::FindWindowLike(undef,$WindowTitle); if($hwnd[0]) { print "window found\n"; } else { print "window not found\n"; } print OUTFILE $WindowTitle; #==> Here perl prints out same chinese te +xt as input to the outfile

Script 2

use Encode; use Win32::GuiTest qw(FindWindowLike); $WindowTitle="V VM"; #Hardcoded the window title in chinese binmode(STDOUT, ":utf8"); print "$WindowTitle\n"; #===> Here perl prints out some chinese text +to the command console, but different from what is given in input fil +e my @hwnd=Win32::GuiTest::FindWindowLike(undef,$WindowTitle); if($hwnd[0]) { print "window found\n"; } else { print "window not found\n"; }

Script 1 reads the chinese window title from unicode file and says that window is not present though window is actually present. Script 2 has chinese window title hardcoded, hence it is giving proper output as window present. What am i doing wrong while trying to read the unicode file. Please help

Replies are listed 'Best First'.
Re: Issue with reading a unicode file
by quester (Vicar) on Jan 07, 2013 at 06:36 UTC
    As a guess, since you seem to be on Windows, your input file is likely to begin with a Byte Order Mark (BOM), which Microsoft uses as a convention to distinguish the various flavors of UTF. A UTF-8 byte order mark would be three bytes long,  0xEF,0xBB,0xBF. In perl, it appears as the code point  "\N{U+FEFF}". You could try  tr/\N{U+FEFF}//d to remove it.
Re: Issue with reading a unicode file
by choroba (Cardinal) on Jan 07, 2013 at 09:48 UTC
Re: Issue with reading a unicode file
by Anonymous Monk on Jan 07, 2013 at 07:33 UTC
    And the bytes of this mysterious file are?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1011948]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 05:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found