Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Parsing/regex help required

by Marshall (Canon)
on Sep 28, 2021 at 02:07 UTC ( #11137077=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parsing/regex help required
in thread Parsing/regex help required

Perhaps you mean "em dash" instead of "en dash"?
This is called "em" because it is similar to the with of "M" in a variable width font.
An en dash is shorter, like the width of the letter "n"

In any event, you will have to be reading using UTF-8 encoding. My dev environment for Perl only can do ASCII. I cannot easily write code for this.

As far as regex goes:
You need to group an or'd expression something like this (-|em_dash)
To make it "non capturing", (?:-|em_dash);

The question is what "em_dash" should be and how that relates to how the data decoding that was used during the read.

update: under some coding scenarios an em dash is \x{2014}.
I think you need "use utf8;" for that to work, but I am not sure.

Some Monks here are quite experienced with utf8 encoding.
Bring it on!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2021-12-02 01:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (16 votes). Check out past polls.

    Notices?