Actually, I'm glad you brought this up. In 5.8.4, there's improved ability (thanks to me) to create your own Unicode classes, and even build cascading ones. The documentation is in perlunicode, and here's an example (you must have Perl 5.8.4 for this to work):
package MyUnicode;
sub InLetters {
return << 'END';
0041 005a
0061 007a
END
}
sub InVowels {
return << 'END';
0041
0045
0049
004f
0055
0061
0065
0069
006f
0075
END
}
sub InConsonants {
return << 'END';
+MyUnicode::InLetters
-MyUnicode::InVowels
END
}
package main;
my $string = "Chicken Stromboli";
while ($string =~ /(\p{MyUnicode::InConsonants}+)/g) {
print "consonant cluster: '$1'\n";
}
__END__
consonant cluster: 'Ch'
consonant cluster: 'ck'
consonant cluster: 'n'
consonant cluster: 'Str'
consonant cluster: 'mb'
consonant cluster: 'l'
I could write about that, and explain the new '&' class operand, which allows you to do the intersection of two or more Unicode classes.
I like this idea. Maybe I can do this and one other topic -- I don't want the article to be too widely scoped.
_____________________________________________________
Jeff [japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|