Re: japhy's regex article for the TPJ
by tachyon (Chancellor) on May 19, 2004 at 00:20 UTC
|
Some thoughts are:
- A theme based article - based around solving common real world problems with REs, perhaps looking at Regexp::Common for theme material. Select the themes to highlight whatever you want.
- Common gotchas and pitfalls with Perl REs
- Regex optimisation, perhaps looking at parsing records and including use of $/ and split to simplify the task. (ie REs + pick the right tool for the job and use all the tools you need)
- C/GNU REs vs Perl REs - know your engine.
| [reply] |
|
You bastard! :-P Great minds think alike I guess.
--
I'm not belgian but I play one on TV.
| [reply] |
Re: japhy's regex article for the TPJ
by graff (Chancellor) on May 19, 2004 at 02:12 UTC
|
Maybe this would be too esoteric or somewhat "ahead of its time", but a little more exposure for the unicode tricks that are now possible with Perl RE's could yield some useful surprises for the average reader.
For example, making up expressions and character classes with things like \p{Punctuation} or \p{CurrencySymbol} (or their short forms \p{P}, \p{Sc}) -- and having these work regardless of what language the text is in -- has a certain attraction to it. (Or maybe I just don't realize what a nerd I am to think so.) | [reply] |
|
Actually, I'm glad you brought this up. In 5.8.4, there's improved ability (thanks to me) to create your own Unicode classes, and even build cascading ones. The documentation is in perlunicode, and here's an example (you must have Perl 5.8.4 for this to work):
package MyUnicode;
sub InLetters {
return << 'END';
0041 005a
0061 007a
END
}
sub InVowels {
return << 'END';
0041
0045
0049
004f
0055
0061
0065
0069
006f
0075
END
}
sub InConsonants {
return << 'END';
+MyUnicode::InLetters
-MyUnicode::InVowels
END
}
package main;
my $string = "Chicken Stromboli";
while ($string =~ /(\p{MyUnicode::InConsonants}+)/g) {
print "consonant cluster: '$1'\n";
}
__END__
consonant cluster: 'Ch'
consonant cluster: 'ck'
consonant cluster: 'n'
consonant cluster: 'Str'
consonant cluster: 'mb'
consonant cluster: 'l'
I could write about that, and explain the new '&' class operand, which allows you to do the intersection of two or more Unicode classes.
I like this idea. Maybe I can do this and one other topic -- I don't want the article to be too widely scoped.
_____________________________________________________
Jeff [japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] [d/l] |
Re: japhy's regex article for the TPJ
by Zaxo (Archbishop) on May 19, 2004 at 00:26 UTC
|
I'd like to see your take on code evaluation and delayed execution blocks. There seems to be some deep voodoo about them. $_ has some mysterious behavior in them, for instance. They strike me as admitting some very neat tricks, but I've never succeeded in devising any.
| [reply] |
Re: japhy's regex article for the TPJ
by belg4mit (Prior) on May 19, 2004 at 00:21 UTC
|
| [reply] |
Re: japhy's regex article for the TPJ
by davido (Cardinal) on May 19, 2004 at 04:43 UTC
|
I'd love to see a discussion of section 5.4.3.4 of Programming Perl (3rd Edition) The Camel Book. The section is called, "Defining your own character properties", and the text makes the following assertion:
Perl itself uses exactly the same tricks to define the meanings of its "classic" character classes (like \w) when you include them in your own custom character classes (like [-.\w\s]).
I'd love to learn a new trick, and can't really make heads or tails of what that section is discussing. ;)
| [reply] [d/l] |
Re: japhy's regex article for the TPJ
by McMahon (Chaplain) on May 19, 2004 at 18:17 UTC
|
I know you'd "rather not write an introductory article", but consider something like the Scientific American model, where you ramp up fast to the real meat of the article while still providing good information to those who might not have the experience (or interest?) to follow you all the way to the end of your arguments.
As a newbie with some experience and aspirations, I find that my favorite articles are the ones that I can follow partway. | [reply] |
|
Well then, I point you to Hitting the Motherlode, the article I wrote for Linux Magazine two years ago.
_____________________________________________________
Jeff [japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] |
Re: japhy's regex article for the TPJ
by bl0rf (Pilgrim) on May 19, 2004 at 22:06 UTC
|
Hello Japhy,
I think you'd do everyone a big favour by explaining/ giving examples of how to use Perl's extended regexes. Even though they have been around for a while I doubt that many people use them ( I know I don't ). They have some useful applications, like the (?: ) grouping which many people would love to know about.
I also support the subject of applying unicode, with a case study perhaps.
| [reply] |
Re: japhy's regex article for the TPJ
by gmpassos (Priest) on May 19, 2004 at 23:58 UTC
|
I don't know in what level you want (or can, due the TJP) to talk about REGEXP. But will be very interesting to explain and show a XML parser made with pure REGEXP.
You can see one at XML::Parser::Lite. I use it for XML::Smart as XML::Smart::Parser, but with some updates and fixes.
Graciliano M. P.
"Creativity is the expression of the liberty".
| [reply] |
Re: japhy's regex article for the TPJ
by bsb (Priest) on May 24, 2004 at 02:26 UTC
|
I think the code blocks are the most interesting of the
above options.
I'm also interested in the cases where a regex is awkward
or not powerful enough.
Those situations where it seems like there should be
a clean simple solution but there isn't (or the solution
requires an insight such as sexegers) | [reply] |
Re: japhy's regex article for the TPJ
by aquarium (Curate) on May 20, 2004 at 05:29 UTC
|
In my opinion...what's really needed for regex is a stable interface into presenting regex to humans. most people i know would like to have the power to access regex easily, but without the arcane symbols. there are some programs about that do this, translating from various spoken languages from/to regex. alas, their gui etc. interface is hardcoded. if we had a module instead....
Please, please, pretty please provide a way to do regex with sexy things like image/sound/video files. thanks heaps. | [reply] |
|
use Regexp::English;
my $re = Regexp::English
-> start_of_line
-> literal('Flippers')
-> literal(':')
-> optional
-> whitespace_char
-> end
-> remember
-> multiple
-> digit;
while (<INPUT>) {
if (my $match = $re->match($_)) {
print "$match\n";
}
}
_____________________________________________________
Jeff [japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] [d/l] |
Re: japhy's regex article for the TPJ
by japhy (Canon) on Jun 30, 2004 at 20:34 UTC
|
I have a completed draft of the article available for viewing at my web site. You can email, /msg, or reply here with comments.
_____________________________________________________
Jeff [japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] |
|
| [reply] |
Re: japhy's regex article for the TPJ
by Gunth (Scribe) on May 20, 2004 at 01:49 UTC
|
I like all the ideas already meantioned here. Another suggestion is to delve alittle into the future of Perl REs, i.e. Perl6
| [reply] |
|
| [reply] |