Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Ye mighty "em dash"

by dave0 (Friar)
on Jun 02, 2005 at 04:35 UTC ( [id://462748]=note: print w/replies, xml ) Need Help??


in reply to Ye mighty "em dash"

When you say "perl spits that line back out as", do you mean "prints to your terminal as"? Perhaps Perl's printing the character, but your terminal cannot display it.

If I take this page and save it as emdash.html, and run:

open(FOO,'<emdash.html') or die $!; while(<FOO>) { if( /Pierrefonds/ ){ print; print join ' ',map { ord } split // ; print "\n"; } }
both lines containing Bernard Patry's riding name appear in my xterm as "PierrefondsDollard", however, looking at the values of each character printed below each line, I can see that the first one contains an extra unprinted character, decimal value 151. That's the em dash.

The fun part is that the em dash character of 151 isn't actually in ISO-8859-1. It's from the Windows Latin 1 character set, which isn't directly compatible with ISO-8859-1. This could explain why it doesn't display correctly in your (or at least, my) terminal. See http://www.cs.tut.fi/~jkorpela/www/windows-chars.html for more details.

Replies are listed 'Best First'.
Re^2: Ye mighty "em dash"
by cory2070 (Initiate) on Jun 02, 2005 at 05:31 UTC
    Thanks Dave, very helpful.

    You're right, my terminal can't display char 151. It looks like my best (err... easiest) workaround is to write a regex to replace all em dashes with & #8212; Here's what I hacked together:

    $html=~s/\x97/\& #8212;/g;

    note: there shouldn't be space between & and #, but I added it so it would display correctly. Here are some more encodings if anyone is looking to translate any other odd characters.

    Cheers!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://462748]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-18 01:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found