my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt";
my @display_files = map m{([^/]+)\.txt}, @files;
Encode::from_to($_, 'ISO-8859-7', 'utf8') for @display_files;
# SO FAR, SO GOOD. The same @display_files array is used later to cre
+ate
# the popup menu, which shows up correctly/as intended in the browser,
# so this use of Encode::from_to() is correct and necessary.
if ( param('select') ) { #If User selected an item from the drop do
+wn menu
my $selected_file = decode('utf8', param('select')); ## ADD THIS
+LINE
### UPDATED 3 days after initial post: it wa
+s originally
### "encode" which, as Nik points out below,
+ was wrong
unless ( grep /^\Q$selected_file\E$/, @display_files )
#If User Selection doesn't match one of the passages then its a
+Fraud!
{
## REPORT AN INVALID SUBMISSION (you don't need to worry abou
+t saying
## what properties it has that make it invalid -- it doesn't
+match any
## known file name, and that is all that matters.
## ... but before exiting, send some kind of error page back
+ to the browser
exit;
}
## IF YOU GET HERE, YOU HAVE A VALID MATCH
## so you can see where your next coding mistake is...
The first time I suggested using the "decode()" function on the "select" param value, you said:
i beleive there is no need to explicitly tell perl to handle param('select') as utf8 it must do this by default i think.
I was able to prove (to my own satisfaction, at least) that your belief here was wrong. So try the suggestion and see what happens. I gather that you don't read documentation much at all, but if you could do that, and spend some time looking at the man page for Encode, you might be able to learn this important concept:
There is a difference between a "perl-internal utf8 string" and a "raw string containing utf8". The first thing is a byte sequence that stores valid utf8 characters and is flagged in perl's internal storage as being a utf8 string; in contrast, that latter thing is a byte sequence that happens to come from some external source of utf8 data, but has not been flagged as a perl-internal utf8 string. As explained in the Encode man page, a "perl-internal utf8 string" and a "raw string" will never match, even if the actual byte sequences in the two strings are identical. The "utf8 flag" being different (set vs. not set) makes the strings different, regardless of anything else.
That is why the "decode('utf8', ...)" function is used on the parameter value -- if it really came from your cgi web form, then it really is a byte sequence for a valid utf8 string, but perl won't consider it to be the same as a "perl-internal utf8 string", even when the actual sequence of bytes is identical. The utf8 flag must be set on both strings, or not set on both strings (in addition to the bytes being the same), for a match to succeed, and setting the utf8 flag is one of the things that the "decode()" function does.
(Nit-picky details:) In complementary fashion, doing "encode( 'utf8'. ...)" on a perl-internal utf8 string will produce a "raw" string (the utf8 flag is turned off). But the "difference" between "perl-internal utf8" and "raw" only applies when "wide" characters are involved -- i.e. those that lie outside the 7-bit ascii range -- note the following command lines using different versions of a one-line script:
perl -MEncode -le '$a="\x{0341}"; $b=encode("utf8",$a); print "a:b ",
+(($a eq $b) ? "same":"different")'
# prints "different" -- $a is perl-internal utf8, $b is raw
perl -MEncode -le '$a="foo"; $b=encode("utf8",$a); print "a:b ", (($a
+eq $b) ? "same":"different")'
# prints "same"
perl -MEncode -le '$a="foo"; $b=decode("utf8",$a); print "a:b ", (($a
+eq $b) ? "same":"different")'
# also prints "same"
perl -MEncode -le '$a=decode("utf8","foo"); $b=encode("utf8","foo"); p
+rint "a:b ", (($a eq $b) ? "same":"different")'
# still prints "same
Regarding the last example: note that running "decode('utf8',$a)" would be an error if $a were already flagged as a perl-internal utf8 value and contained wide characters. If all this confuses you, get over it. That's the reality.
(updated to fix a typo and add clarification in the last paragraph)
LAST UPDATE: Okay, I know that you have tried adding the "decode()" line before, and you reported the error message you got as a result, which was "Cannot decode string with wide characters at ... line 182" I didn't make the connection until after I updated that last paragraph above. The point is, at line 182 (wherever that was in your script -- you didn't make that clear) you are running "decode()" on a string that already has the utf8 flag set, and contains a wide character.
If line 182 is the decode line that I told you to add, then I'm really puzzled, because it would mean that this cgi parameter string is already flagged as utf8 (though I can't imagine how), and if that's true, and the string came from the popup menu, then it should match. (In this case, try opening a separate text file for output -- make sure to set the mode to ">:utf8" -- and print the parameter and @display_files strings to that file, so you can inspect them manually, with a hex-dump tool if need be.)
But if line 182 is somewhere else, it's probably just the next bone-headed programming error in your script, and you had not seen it before because the script had never gotten that far before. It's really frustrating when you leave out relevant details like this. Even after I told you days ago that you should have shown us that line, you didn't do it. It's tiresome.
Think harder before you post again -- read what you write before you hit the "create" button, and try to imagine that you are someone else, and think what questions this other person would ask about the information in the post. Then add the answers to those questions. Better yet, try to imagine what advice this other person would give you, and try it out before posting. Take your time, don't rush it. Only create the node when you have included a clear description of what you have tried (code, inputs and outputs). |