|
hello monks.
Documents for Unicode
perlunitut | 6 pages | Very very short overview for unicode in perl + FAQ. |
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) | 8 pages | About Charcter Set, Code Page, Unicode itself. Short History of Unicode. |
perluniintro | 12 pages | This is the first thing to read (I think). |
Character Encodings in Perl | 7 pages | all-in-one doc for encoding. Written by German Author. |
perlunicode | 20 pages | Main document of perl's unicode. Through and precise, or too much for beginner. |
Perl Programming/Unicode UTF-8 | 15 pages | This document explains internal encoding of Perl (N8CS, utf-8) and also describe other problems. When you stumbled with 0x80-0xFF problem, this document explains the reason. |
ikegami explains use feature 'unicode_strings' | 1 pages | It's for bug fix. | |
Unicode::UCD | 19 pages | Unicode Character Database. Script, Block, Properties of Unicode character. | |
perluniprops | 49 pages | Reference for Character properties which could be used with \p{Greek} . | |
\p{Print} to code points ... | not yet read | ||
Unicode support in perlguts | not yet read |
regex memo
Replace the nth occurence
\K, similar to zero width look behind, keep the left of \K exclude from $&.
my $nth = 4 -1 ; #replath 4th , to | my $str = 'a,bb,ccc,dddd,eeeee,ffffff'; $str =~ s{ (?: , [^,]*){$nth} \K , }{|}xms;
Error in my Regular expression pattern
pos() moves if regex succeeds, to reset, pos($_)=undef;
RegEx related line split
zero width look ahead good example, it acts like place holder.
regex: negative lookahead
Negative lookahead
Perl Regex Repeating Patterns
Regex Repeating Patterns, \G anchor
http://perlmonks.org/index.pl?node_id=935995
regexp: removing extra whitespace
http://perlmonks.org/index.pl?node_id=929160
Why do these regex variants behave as they do?
Re^3: Retain first 4 characters of a string various ways to making "Apple iPhone 4 Black Cover" to "Appl-iPho-4-Blac-Cove" (space separated words to 4 letter hyphened)
Limiting number of regex matches three dogs of marshall