Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Matching non-ASCII file contents with file name.

by mldvx4 (Friar)
on Dec 23, 2022 at 06:39 UTC ( [id://11149040] : note . print w/replies, xml ) Need Help??

in reply to Re: Matching non-ASCII file contents with file name.
in thread Matching non-ASCII file contents with file name.

Thanks! The detailed explanation helped and is appreciated.

"Plus, AFAIK, file name encodings are a very complicated topic, and therefore I think you might do yourself a favor by not using "" in filenames and URLs."

Of course. However, there are several reasons:

Use of non-ASCII characters like , , , , 月, 日, or even ¿ or ¡ is to be expected these days, even in file names and thus URLs.

The rename utility listed above deals with the renaming, and seems to match what can be produced manually via a local terminal emulator, a local console, or a remote ssh+tmux connection. So it was my script which was the odd man out and therefore needed correction.

The file names, minus the inverted question mark, are the result of using wget to scrape the output from some legacy PHP scripts which are not / cannot be maintained any more. Aside from the very long file names, the method works reasonably well for converting the whole mess to a static HTML archive. Unfortunately, that leaves a question mark in the file name and that is not tolerated by web servers and use it to delimit the start of a query string and the end of the file name. So a replacement character is needed and ¿ seems the least problematic semantically.