$page in is the raw html document
Does
$page consist of bytes (e.g.
"\x{e2}\x{98}\x{ba}", which
can be decoded as U+263A White Smiling Face in UTF-8, or
"\x{fe}\x{ff}\x{26}\x{3a}" which is the
same U+263A, but in UTF-16), or of characters (e.g.
"\x{263A}", which is a U+263A White Smiling Face
character and should be
encoded before writing it anywhere)?
HTML::TokeParser seems to ask for the latter (it wants HTML to be decoded to characters from bytes in whatever encoding they were encoded to). See also:
perlunitut.
Of course, this brings us to another problem of correctly determining the encoding of a byte stream, which sometimes should be done by an HTML parser (when charset is defined by meta tag in HTML4/HTML5), sometimes should be done by HTTP client (when a proper Content-type header is sent) and sometimes just has to be guessed. And it's not impossible to misconfigure a webserver to serve Content-Type: text/html; charset=utf-8 with <meta charset="koi8-r"> in HTML while the real encoding is UTF-16LE with BOM.