Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Seeking help with Extracting files from zip

by aksjain (Acolyte)
on Jan 14, 2015 at 14:37 UTC ( [id://1113232]=note: print w/replies, xml ) Need Help??


in reply to Re: Seeking help with Extracting files from zip
in thread Seeking help with Extracting files from zip

Thanks for your reply. By worse i mean the filename characters gets mangled. I tried extracting the same zip file using windows tools like winrar and it extracts the files with proper names likewise it should be. I am using windows 7 and have Japanese and Chinese language packs installed on the machine. Below is the link to an image which shows the difference in name of the folder.

http://s4.postimg.org/bnphbww59/Japanese.png

  • Comment on Re^2: Seeking help with Extracting files from zip

Replies are listed 'Best First'.
Re: Seeking help with Extracting files from zip
by jonadab (Parson) on Jan 14, 2015 at 14:54 UTC

    Ok, so I assume the katakana filename there is what it's supposed to look like, and the gibberish filename with nearly more than twice as many characters, most of which look like they came from the miscellanous-symbols-and-accented-characters section of an eight-bit character set, is the result of running your code?

    This definitely looks like a charset translation issue. The Archive::Zip documentation indicates that setting UNICODE causes the filenames in the archive to be treated as UTF8. Perhaps they're not? Maybe they're UTF16 or UTF32 or some other Unicode encoding (or, heaven help you, some pre-Unicode Asian encoding like Shift-JIS or whatnot)? If you can figure out what fiddling needs to be done to preserve the encoding, you can pass the correct filename to extractMemberWithoutPaths and that should probably work, I think...

    Unfortunately, I don't know that much about the details of the character sets involved, but maybe someone else will come along now and be able to recognize what's going on. (Even just being able to recognize which encoding is being erroneously treated as though it were some other encoding would go a long way toward figuring out the problem.) That image you provided should help.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1113232]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 07:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found