Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Explanation requested regarding negation of characters classes in perlretut

by Anneq (Vicar)
on May 29, 2004 at 00:13 UTC ( [id://357389]=perlquestion: print w/replies, xml ) Need Help??

Anneq has asked for the wisdom of the Perl Monks concerning the following question:

I came across the following paragraph in perlretut regarding negation of character classes:
"Because a period is a metacharacter, it needs to be escaped to match as an ordinary period. Because, for example, \d and \w are sets of characters, it is incorrect to think of [^\d\w] as [\D\W]; in fact [^\d\w] is the same as [^\w], which is the same as [\W]. Think DeMorgan's laws."

My question is, why is it incorrect to think of [^\d\w] as [\D\W]?

UPDATE

While gjb and scooterm explained it clearly, (I realized after the fact), I still didn't get it, duh! Zaxo's explanation turned on the light bulb. Thanks to all. Next stop, google for DeMorgan's laws. Thank's to BrowserUK for giving me something to play with.

Anne

Replies are listed 'Best First'.
Re: Explanation requested regarding negation of characters classes in perlretut
by Zaxo (Archbishop) on May 29, 2004 at 01:11 UTC

    The "Think DeMorgan . . ." statement says it. A []-defined character class is structured like logical "or", so a character matching [\D\W] is either in \W or in \D (That's actually a degenerate example since \w contains \d). So [^\d\w] is logically "not (d or w)". DeMorgan distributes "not" over "or" as "(not d) and (not w)", while [\D\W] translates as "(not d) or (not w)".

    After Compline,
    Zaxo

Re: Explanation requested regarding negation of characters classes in perlretut
by gjb (Vicar) on May 29, 2004 at 00:29 UTC

    The point is that \D is the set of all characters that do not belong to \d, so 'a', 'C', '&' all belong to \D. Now similar for \W, the set of all characters not in \w, hence '&', ' ', etc.

    So far so good, but [\D\W] is the union of the sets \D and \W, hence it will contain '&' and ' ', but also 'a' and 'C' since the latter are members of \D. That's definitely not equal to the set [^\d\w] that contains none of the characters in \d or \w. The set [^\d\w] definitely doesn't contain 'a' or 'C' that are members of [\D\W].

    Hope this helps, -gjb-

Re: Explanation requested regarding negation of characters classes in perlretut
by dimar (Curate) on May 29, 2004 at 01:10 UTC

    Think DeMorgan's laws:

    ### ( (! $a) && (! $b) )
    .. is the same thing as ...
    ### (! ( $a || $b) )

Re: Explanation requested regarding negation of characters classes in perlretut
by BrowserUk (Patriarch) on May 29, 2004 at 01:16 UTC

    Maybe a picture will help. Try running this on a screen that is at least 108 chars wide.

    #! perl -slw use strict; for my $re ( qw[ .* \d \D [^\d] [^\D] \w \W [^\w] [^\W] [\d\w] [^\d\w] [\D\W] [^\D\W] ] ) { printf "%8s:'%s'\n\n", $re, join '', map{ /$re/ ? $_ : ' ' } map{ chr } 32 .. 126; };;

    Normally I'd post the output, but the OSU PM wrap feature doesn't like it.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://357389]
Approved by pfaut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-24 18:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found