http://qs321.pair.com?node_id=775558

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl $line="” 3”"; $line =~ s/[&|&amp\;]rdquo;/'/g; print $line;
I have to substitute ” or ” with single quotes. Now the output I get from the above code is ' 3&amp' Where the expected output should be 3.

Replies are listed 'Best First'.
Re: quotes substitution
by citromatik (Curate) on Jun 29, 2009 at 06:23 UTC

    It is not clear for me what you want to accomplish, but generally use HTML::Entities to process html entities:

    use strict; use warnings; use HTML::Entities; my $line="” 3”"; decode_entities ($line);

    If this module is of no help, please, try to give a step backwards and explain the kind of conversions you want to be done

    Update: A closer look at your code reveals that you are not using alternation correctly: [] in a regexp is a character class, but probably you are wanting () instead (see perlretut and perlre):

    $line="” 3 ”"; $line =~ s/(?:&|&amp\;)rdquo;/'/g; print "$line\n";

    That prints:

    ' 3 '

    citromatik

      hey there......
      that regex does not work properly (dont be offended)...... i tried it but the output is

      "'; 3'"

      so i think..... there should be one more useless line like

      $line=~s/;//g; #no need of /g as such

        $ cat 775558.pl use strict; use warnings; my $line="” 3”"; $line =~ s/(?:&|&amp\;)rdquo;/'/g; print "$line\n"; $ perl 775558.pl ' 3'

        citromatik

Re: quotes substitution
by wfsp (Abbot) on Jun 29, 2009 at 08:53 UTC
    One more take on this just for completeness. Normalise the string by decoding twice and then do the replacement.
    #!/usr/bin/perl use warnings; use strict; use HTML::Entities; my $line = qq{” 3”}; my $decode_1 = decode_entities $line; my $decode_2 = decode_entities $decode_1; print qq{$line\n}; print qq{$decode_1\n}; print qq{$decode_2\n}; $decode_2 =~ s/\x{201D}/'/g; #' print qq{$decode_2\n};
    Wide character in print at C:\perm\dev\_new.pl line 12. Wide character in print at C:\perm\dev\_new.pl line 13. ” 3” ” 3” ” 3” ' 3'
    The symbols here are the double quotes and the warnings are as expected.

    I have a general rule of thumb: decode often (it doesn't hurt), encode ONCE (or your life will be a misery and you'll have strings of junk like the one you had). :-)

      I have a general rule of thumb: decode often (it doesn't hurt), encode ONCE

      BAD idea. Decode and encode MUST match. Decoding too often (even once more than needed) DOES hurt. Imagine a piece of HTML source where someone explains how to encode the ampersand in HTML:

          Just write &.

      Decode once (like a browser does):

          Just write &.

      This is what you see in the browser, it is the correct solution.

      Decode for the second time, because, well "it doesn't hurt", as you said:

          Just write &.

      This is just wrong. Decoding too often damages the content.

      This is not HTML specific, you will get the same problem when you use C-style backslash escapes, you get the same problem with URL encoding. And I'm very sure there are lots of other encodings that will damage the content when the decoding routine is applied more than once.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: quotes substitution
by hangon (Deacon) on Jun 29, 2009 at 07:16 UTC

    Brackets are for character classes, use parens for grouping. Try this:

    $line =~ s/(&|&)rdquo/'/g; # or for non-capturing group: $line =~ s/(?:&|&)rdquo/'/g;