Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: [[:punct:]] vs. {IsPunct} in 5.8

by particle (Vicar)
on Nov 02, 2003 at 14:26 UTC ( [id://303910]=note: print w/replies, xml ) Need Help??


in reply to :punct: vs. {IsPunct} in 5.8

for some background, perlre (5.008) states:

The following equivalences to Unicode \p{} constructs and equivale +nt backslash character classes (if available), will hold: [:...:] \p{...} backslash alpha IsAlpha alnum IsAlnum ascii IsASCII blank IsSpace cntrl IsCntrl digit IsDigit \d graph IsGraph lower IsLower print IsPrint punct IsPunct space IsSpace IsSpacePerl \s upper IsUpper word IsWord xdigit IsXDigit <em>For example "[:lower:]" and "\p{IsLower}" are equivalent.</em>

if your results match mine,

#!/usr/bin/perl use strict; use warnings; $|++; my %classes= qw/ alpha IsAlpha alnum IsAlnum ascii IsASCII blank IsBlank cntrl IsCntrl digit IsDigit graph IsGraph lower IsLower print IsPrint punct IsPunct space IsSpace upper IsUpper word IsWord xdigit IsXDigit /; for( keys %classes ) { my( $r_posix, $r_unicode )= ( qr/[[:$_:]]/, qr/\p{$classes{$_}}/ ); print "testing $r_posix and $r_unicode$/"; for my $x (0x00..0x7e) { local $_= chr $x; printf "0x%x (%3d.) differ$/" => $x, $x if /$r_posix/ xor /$r_unicode/; } } __END__ testing (?-xism:[[:digit:]]) and (?-xism:\p{IsDigit}) testing (?-xism:[[:upper:]]) and (?-xism:\p{IsUpper}) testing (?-xism:[[:xdigit:]]) and (?-xism:\p{IsXDigit}) testing (?-xism:[[:cntrl:]]) and (?-xism:\p{IsCntrl}) testing (?-xism:[[:alnum:]]) and (?-xism:\p{IsAlnum}) testing (?-xism:[[:space:]]) and (?-xism:\p{IsSpace}) testing (?-xism:[[:print:]]) and (?-xism:\p{IsPrint}) testing (?-xism:[[:ascii:]]) and (?-xism:\p{IsASCII}) testing (?-xism:[[:word:]]) and (?-xism:\p{IsWord}) testing (?-xism:[[:alpha:]]) and (?-xism:\p{IsAlpha}) testing (?-xism:[[:punct:]]) and (?-xism:\p{IsPunct}) 0x24 ( 36.) differ 0x2b ( 43.) differ 0x3c ( 60.) differ 0x3d ( 61.) differ 0x3e ( 62.) differ 0x5e ( 94.) differ 0x60 ( 96.) differ 0x7c (124.) differ 0x7e (126.) differ testing (?-xism:[[:lower:]]) and (?-xism:\p{IsLower}) testing (?-xism:[[:blank:]]) and (?-xism:\p{IsBlank}) testing (?-xism:[[:graph:]]) and (?-xism:\p{IsGraph})

then i'd list this as a bug, and contact p5p. it seems only [[:punct:]] and \p{IsPunct} differ. this is not expected behavior.

~Particle *accelerates*

Replies are listed 'Best First'.
Re: Re: [[:punct:]] vs. {IsPunct} in 5.8
by dakkar (Hermit) on Nov 02, 2003 at 20:45 UTC

    It's a bug alright. A documentation bug...

    I checked the Unicode properties, and these are the results:

    CodepointCharClass
    0024$Currency Symbol
    002B+Math Symbol
    003C<Math Symbol
    003D=Math Symbol
    003E>Math Symbol
    005E^Modifier Symbol
    0060`Modifier Symbol
    007C|Math Symbol
    007E~Math Symbol

    So those are not "punctuation" according to the Unicode standard... Time for a PunctPerl class, to keep company to SpacePerl?

    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)

Re: Re: [[:punct:]] vs. {IsPunct} in 5.8
by graff (Chancellor) on Nov 02, 2003 at 17:03 UTC
    Thanks for such a nicely crafted verification. (I wanted to check the other POSIX vs. Unicode classes as well, so you saved me some trouble -- and shown a neat approach!)

    I have posted the observation to both perl5-porters and perl-unicode mail lists.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://303910]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-19 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found