use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper("ABC"));
to see what's happening.
We can't tell how your data-sources encoded your strings and you haven't even used code-tags to help us at least a bit.
Update
changed tiitle to make a difference | [reply] [d/l] [select] |
Anonymous Monk:
It's kind of hard to point to the difference from here: The strings you posted could have been encoded/decoded in various places between getting into your text editor and getting into the PerlMonks site. It could be that your local code page1 supports the first string but not the second (forcing it to UTF-8).
As a native ASCIIan, I tend to find Unicode frustrating. Not because it's frustrating in and of itself, but because with enough systems to go through, there's frequently a monkey in the middle causing difficulty.
It's the same experience I had with XML--too many programmers2 think you can create XML simply by doing a little string manipulation to wrap some stuff in tags and/or quotes. Then they insist that their "XML" file is valid, even when it violates *dozens* of rules laid out in the standard, and multiple XML validators insist that it isn't valid.
Notes:
1 Assuming you're running Windows.
2 At least some 10+ years ago when I had to deal extensively with XML. The situation may suck less nowadays.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] |
All characters of string 2 are in Latin1, but some characters in string 1 are beyond codepoint 0xff.
I'm making a guess: your program handles decoding not properly, and so you end up with string 2 encoded as Latin1 and string 1 decoded into Perl's internal format, which – when passed into the world outside the program – accidentally does the correct thing.
This explanation fits the symptoms, but since you did not show any code, we can't be sure.
| [reply] |
Hi Daxim,
I think your Hint did the trick.
Following code suggested by another monk seems working fine for me.
I have checked below code on filenames in diff languages, like Chinese, Japanese, Danish, polish, Spanish and off-course English.
use Encode;
$filename = decode_it($filename);
$filename = encode('UTF-8', $filename);
#---------------------------------------
sub decode_it {
my $s = shift;
eval {
$s = decode('UTF-8', $s, 1);
1;
} or do {
$s = decode('latin1', $s, 1);
};
return $s;
}
Thank you. Cheers.
| [reply] [d/l] |
So there's a few complicating factors here. First you need to understand how character encoding works. For that I would suggest a thorough reading of https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/.
With that in mind, your source code files are usually saved in UTF-8, and strings that come into your program through filehandles or command line arguments also come in as UTF-8, unless you set layers that decode them. The source code itself is automatically decoded if you "use utf8;". With input strings you can use Encode to decode them if not the ":encoding(UTF-8)" layer on the handle. It's usually a good idea to work with decoded strings because then logically your string contains the characters you think it does, instead of the bytes that represent them in UTF-8.
But the problem is that you're dealing with filenames here, and filenames are all sorts of broken in Perl.* Perl essentially treats them as their *internal* bytes like a buggy XS module might, regardless of what the *logical* contents of the string are. So that's why there is a difference here when you didn't do anything wrong. The first string can't be represented in your native encoding, so the internal bytes are accidentally the correct UTF-8 encoding of your filename. The second string can, so the internal bytes are not the same as the UTF-8 encoding, unless you utf8::upgrade it to change the internal encoding, or use Encode to explicitly change its logical contents to the UTF-8 encoding of the string. My recommendation would be: use decoded strings in general, and encode them explicitly for use as a filename.
* https://rt.perl.org/Public/Bug/Display.html?id=130831
| [reply] |
Thanks Monks, for all your suggestions and information. .
| [reply] |