http://qs321.pair.com?node_id=1198765

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I wish to encode regular expressions, so that I can convert it to JSON and store in Redis. Reading perdoc JSON::XS has given hope!

my %hash = ( Command => "show debug", Filter => [ qr(^-{1,}\|-{1,}$), qr(^$) ] );

Using JSON:XS to encode, requires a callback - which I'm sure someone has done once somewhere. Can someone give me a pointer to where I might find such a function?

Many thanks in anticipation

Replies are listed 'Best First'.
Re: Freezing a regular expression qr()
by choroba (Cardinal) on Sep 06, 2017 at 12:10 UTC
    You can use the convert_blessed option. To add a TO_JSON method to regexes, just define it in the ref qr() namespace, i.e. Regexp :
    #! /usr/bin/perl use warnings; use strict; use JSON::XS; my $json = JSON::XS->new->convert_blessed; sub Regexp::TO_JSON { "" . shift } my $re = qr/^-{1,}\z/; print $json->encode([$re]), "\n";

    Update:

    You can even localize the change, so that regexes behave normally in other parts of the code:

    { no strict 'refs'; local *{'Regexp::TO_JSON'} = sub { "" . shift }; print $json->encode($re), "\n"; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Freezing a regular expression qr()
by Corion (Patriarch) on Sep 06, 2017 at 12:09 UTC

    I can't really tell if what you want is sane, but I found the documentation of JSON::XS to be fairly explicit. If the convert_blessed option is set and an object has a TO_JSON method, that method will be called to convert the object to a string (or whatever).

    So, "just" add a TO_JSON method to the Regexp namespace and you can serialize your regular expressions as strings.

    #!perl -w use strict; use JSON::XS 'encode_json'; print encode_json({foo=>'bar'}); my $j=JSON::XS->new;$j->convert_blessed(1) print $j->encode({foo=>qr/bar/}); sub Regexp::TO_JSON{qq($_[0])}"

    Update:

    After discussion of the topic with choroba, I have to add that it is likely a far better approach to explicitly convert your regular expressions to strings manually. As you are generating the data structure, you should also know at which locations there are regular expressions. Convert these to strings.

    Also, I want to question the choice of JSON as a serialization format. If you want speed, there are better approaches that allow serializing regular expressions, like Sereal. If you want something human-editable, YAML at least allows comments, which JSON doesn't allow.

      "... comments, which JSON doesn't allow."

      The inevitable addendum: It seems like Cpanel::JSON::XS does:

      "As a nonstandard extension to the JSON syntax that is enabled by the relaxed setting, shell-style comments are allowed. They can start anywhere outside strings and go till the end of the line."

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        As a nonstandard extension to the JSON syntax that is enabled by the relaxed setting, shell-style comments are allowed.

        I don't like that. It feels like an "impedance mismatch". Why on earth would one want to use comment syntax from (unix) shells in something derived from JavaScript? JSON was designed so that you could simply pass it to JavaScript's eval function (yes, that would be insecure, but it is possible). Using shell comments breaks that feature. The most natural comment extension for JSON would be JavaScript comments, i.e. from /* to the next */ and from // to the next end of line.

        The //-style comments have the slight disadvantage of giving some whitespace characters (\r and \n) a second meaning, whereas pure JSON treats all whitespace characters outside strings equally and allows to freely replace them with other whitespace characters.

        Apart from shell comments, I can only imagine two more mismatching comment variants:

        • The REM keyword, stolen from BASIC and DOS batches.
        • A C in the first column of a line, stolen from classic FORTRAN.

        But while surfing, I found an even more mismatching comment syntax at http://www.gavilan.edu/csis/languages/comments.html:

        • A * in column 7 of a line, stolen from COBOL.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)