Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^2: utf8::upgrade and $1

by creamygoodness (Curate)
on Aug 30, 2009 at 12:45 UTC ( #792174=note: print w/replies, xml ) Need Help??

in reply to Re: utf8::upgrade and $1
in thread utf8::upgrade and $1

I used utf8::upgrade() as a pure Perl example, so that I wouldn't have to resort to Inline::C or XS and more people would be able to run the sample code.

Perl_sv_utf8_upgrade_flags_grow is one of those root functions that's invoked via many wrappers, though, like Perl_do_openn or Perl_sv_setsv_flags. There are many ways to get at it.

As noted, I discovered the issue via the SvPVutf8 XS macro. The devel branch of one of my CPAN distros, KinoSearch, is a mostly-C library which uses UTF-8 strings exclusively internally. Therefore, I use SvPVutf8 rather than SvPV for accessing string pointers from arguments.

If anybody ever uses $1 as an argument to any XS library function which uses SvPVutf8, it will get upgraded, triggering the bug:

$category =~ /(\w+)/ my $term_query = KinoSearch::Search::TermQuery->new( field => 'category', term => $1, );

Other libraries which use SvPVutf8 include Mail::SpamAssassin, Glib, Tk, etc. However, I suspect that the problem isn't limited to us. It's more that using m//g is a little esoteric, and many functions reset $1 by turning off the SVf_POK flag -- e.g. length($1) will do it. So the problem tends not to persist for very long -- but while it does, you can get some maddeningly subtle bugs!

Replies are listed 'Best First'.
Re^3: utf8::upgrade and $1
by Your Mother (Archbishop) on Aug 30, 2009 at 17:15 UTC

    I think that using $1, $_, $@, and friends as arguments to external methods/subs is always a bad idea. You just don't know what's going to be done in between regardless of issues like the one you found. So I'd say documenting the issue is all that's necessary.

    By the way, I think KinoSearch is fantastic. I can't thank you enough for doing it.

      Thanks for the kind words. :)

      As a practical matter, I think your recommendations to end users regarding using special variables as arguments are sound advice. The same holds true for variables which are overloaded, tied, and so on. Partly this is because XS modules have options with regards to how they treat arguments, and it's hard to get everything right.

      In this case, however, I believe that the problem is both contained and solvable. If I'm right, $1 should never have its SVf_POK flag set -- it will always have SVp_POK set, indicating that it has a valid "private pointer", but never the SVf_POK flag. From sv.h:

      #define SVf_IOK 0x00000100 /* has valid public integer value +*/ #define SVf_NOK 0x00000200 /* has valid public numeric value +*/ #define SVf_POK 0x00000400 /* has valid public pointer value +*/ #define SVf_ROK 0x00000800 /* has a valid reference pointer * +/ #define SVp_IOK 0x00001000 /* has valid non-public integer va +lue */ #define SVp_NOK 0x00002000 /* has valid non-public numeric va +lue */ #define SVp_POK 0x00004000 /* has valid non-public pointer va +lue */

      The task is thus to identify any such variables within Perl_sv_utf8_upgrade_flags_grow and ensure that the SVf_POK flag is off when the function returns. That can be achieved either by never turning it on in the first place, by turning it off at some point, or by throwing an exception.

      It may actually be important to ensure that the flag never gets turned on. It's not clear to me that it's valid to call SvPV_force on $1. Should the attempt trigger a "modification of readonly value" exception?

      The questions I would like answers to are,

      • How are scalars with PERL_MAGIC_sv magic different from ordinary scalars?
      • Are there any scalars other than the capture values which are assigned PERL_MAGIC_sv?
      • Does every scalar with PERL_MAGIC_sv magic have the SVp_POK flag set?
      • Can we use this "private" buffer in place of the standard string buffer for the purposes of Perl_sv_utf8_upgrade_flags_grow, and if so, are there any actions we need to take to ensure its safety?

      Based on the answers to those questions, we should be able to come up with the proper incantation -- either at the beginning or near the end of the function -- to ensure that $1 leaves with its SVf_POK flag unset.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://792174]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2023-03-24 14:58 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (61 votes). Check out past polls.