Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Regex with HTML::Entities

by Horst.Lohnstein (Initiate)
on Nov 23, 2021 at 08:57 UTC ( [id://11139047]=note: print w/replies, xml ) Need Help??


in reply to Re: Regex with HTML::Entities
in thread Regex with HTML::Entities

Hi Fletch, thank you for your advice! I checked it with \Q...\E and also with (?: ...) which omits the capturing of the expressions in the parens, but nothing appears to help. I was wondering whether the tilde ~ produced trouble, but when quoted that should be the case. Best, Horst

Replies are listed 'Best First'.
Re^3: Regex with HTML::Entities
by Fletch (Bishop) on Nov 23, 2021 at 09:13 UTC

    Don't know what to tell you other than try providing a SSCCE that can actually be run. This below works as I expect it so you're doing something strange or (entirely possible) your problem statement's being misread.

    (Also the <code> formatting is doing weird things but I actually have literal ✶ in my source and the output where it's being replaced with the entity below everywhere save the initialization of $wonky_char. Not sure what's the right way to get literal UTF8 chars in sample code using utf8.)

    #!/usr/bin/env perl use 5.034; use HTML::Entities qw( decode_entities ); use utf8; my $input = qq{{&#10038;Adjektive (Nominalflexion)~87&#10038;}}; my $wonky_char = decode_entities( q{&#10038;} ); binmode( STDOUT, q{:utf8} ); say qq{\$input: $input}; say qq{\$wonky_char: $wonky_char}; my $to_match = "Adjektive (Nominalflexion)~87"; my $new_string = $input =~ s{\{$wonky_char(\Q$to_match\E)$wonky_char\}}{<div>I found +'$1'</div>}r; say qq{\$new_string: $new_string}; my $cleaner_regex_sample = $input =~ s{ \{ $wonky_char (\Q$to_match\E) $wonky_char \} }{<div>Al +so found '$1'</div>}rx; say qq{cleaner: $cleaner_regex_sample}; exit 0; __END__ $input: {&#10038;Adjektive (Nominalflexion)~87&#10038;} $wonky_char: &#10038; $new_string: <div>I found 'Adjektive (Nominalflexion)~87'</div> cleaner: <div>Also found 'Adjektive (Nominalflexion)~87'</div>

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re^3: Regex with HTML::Entities
by ikegami (Patriarch) on Nov 23, 2021 at 15:00 UTC

    Please show the output of

    printf "%vX\n", $text;

    I bet your text doesn't actually contain ✶. Did you decode your inputs? You probably have it in its encoded form.


    By the way,

    my $sep = decode_entities('&#10038;');

    is a complicated way of writing

    my $sep = "\N{U+2736}";

    or

    my $sep = "\x{2736}";

    or

    use utf8;
    my $sep = "✶";
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11139047]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-25 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found