Yea or Nay this idea, please. I've had it for a month or so.

Around the time of my "regex reversal" craze, I also had the idea of providing, as a learning tool (or more?), a means of translating from a simple english specification to a Perl regex. The idea being that this would help people understand what a regex does by forming one from what they say.

An example might be (if it were function-oriented):
# /^[+-]?\d+$/ $match_integers = form_REx( start, # ^ class("+-"), # [+-] optional, # ? digit, # \d one_or_more, # + end # $ );
This is pretty easy to read, no? And it helps make a connection between the "line noise" and the instruction associated with it. Here's another example, if it were to parse a string holding the specification:
# /^[+-]?\d+$/ $match_integers = form_REx(<< 'END'); # single-quoted! start # ^ class "+-" # [+-] optional # ? digit # \d one_or_more # + end # $ END
It looks very similar (and it should). It's just a matter of "do I want to make lots of functions" or "do I want to do a lot of parsing".

But before I do this... would anyone find it useful? I'm not about to waste time on something no one needs.

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Regex Learning Tool
by quidity (Pilgrim) on Nov 23, 2000 at 00:14 UTC

    Learning tools like this don't really help you to learn much as you end up having to learn twice as much in the end. It is better to teach by presenting people with /x regexes, and examples of what they match and why.

    That said, a regex language would allow for creation of regular expressions in multiple syntaxes. There already is a 'standard' for these though, so I don't know if it would be worth the extra effort to implement this.

Re: Regex Learning Tool
by KM (Priest) on Nov 23, 2000 at 00:27 UTC
    I would have to agree with quidity on this. First, you would need to teach someone what a regex is. Then, how they would represent it with your defined subset of English, then what the final product means. Seems like an uneeded step. Basically, it is like having to learn OmniRegex, in order to learn Regex.. when noone else uses OmniRegex. :)

    I learned the mastery of regex's by reading MRE, and literally sitting around figuring out how to get/match pieces of strings. I would propose this as a better tool to teach REs...

    Give a string(s), then give problems, such as:
    * Use a RE to find how many occurences of 'X' are in the string
    * Use s!!! to substitute the first occurence of 'X' with 'Y'
    * Use s!!! to substitue the second occurence (only) of 'X' with 'Y'
    * Use an RE to see if the string 'X' can be matched in 'Y', disregarding quotes..
    * etc...

    Basically, that's what I did (as well as reading MJD's article on how the RE engine works, and other topical articles). And, I think that it worked quite well.. I haven't struggled with REs ever since.

    I would be happy to help anyone come up with such exercises. Just ask if anyone wants to do it.


(Ovid) Re: Regex Learning Tool
by Ovid (Cardinal) on Nov 23, 2000 at 02:23 UTC
    japhy, I like what this does, but it strikes me as similar to the English module: a nice idea that few, if any, use. However, it might be justifiable the way's HTML methods are: as a way of validating the syntax of the output.

    For example, many people object to the following in a CGI program:

    print $q->Tr( { -bgcolor => $bgcolor[ $toggle->() ] }, $q->td( $key . "&nbsp;&nbsp;" ), $q->td( $q->pre( $data ) );
    Why don't I just use HTML with a here document or a template? In this case, the value of the HTML functions is apparent:
    • I can't mis-nest tags.
    • I can't forget to close a tag.
    • I get a handy "syntax check" of HTML code that ordinarily wouldn't be checked.
    The last item isn't as much of an issue here, but the first two could be. I think that if it's constructed properly, invalid constructs may be more glaring and/or harder to construct. For example, how many times have we seen regexes like the following?
    $somevar =~ /^[\d|\w|\*]+$/; # Bad regex, no biscuit.
    With your syntax, a person could more naturally write the following:
    $match_stuff = form_REx( start, # ^ class("\d\w*"), # [\d|\w|\*] <- still bad! one_or_more, # + end # $ );
    And you could output /^[\w*]+$/ for the regex (assuming that you check for existence of overlapping metacharacters). This would help with regex syntax problems and could cut down on some of the more insidious logic problems that newbies (and Ovid) are prone to.


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re (tilly) 1: Regex Learning Tool
by tilly (Archbishop) on Nov 23, 2000 at 01:29 UTC
    I think that going the other way would be easier and more useful. Take an uncommented RE and produce a "pretty-printed" version with standard comments inserted explaining what each piece does.

    That way people can take existing solutions and try to figure out how they work without having to learn two parallel languages...