I am trying to use Perl to excerpt lines of Chinese poetry from web pages where they are embedded in lots of HTML. According to my copy of the "Programming Perl" book, any version from 5.6 on should deal with Unicode happily -- the Perl on my Mac is many versions later than that. But when I run the script I've written over one of these web pages, where Chinese graphs ("characters") should be printed out I just see question marks. Odder still, there seem to be exactly three question marks per Chinese graph; so far as I know, Unicode uses two bytes per character.
I'm not even sure whether this is a Perl question; I am wondering whether Chinese has been encoded on the web page in some way other than via Unicode. But however it has been encoded, my web browser (Firefox) and my text editor (BBEdit) seem to recognise it fine. I am really at a loss as to how to approach this problem.
I probably should add that my Perl status is probably "intermediate". I have used the language a fair amount, for real tasks rather than just playing, but have never needed to move beyond the core language -- I have never used "pragmas", for instance.
Any advice much appreciated!