Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

RE: Why I like functional programming

by dchetlin (Friar)
on Oct 01, 2000 at 05:20 UTC ( #34793=note: print w/replies, xml ) Need Help??

in reply to Why I like functional programming

Hello, I am the HTML::Parser nazi. I go around commenting on other people's attempts to parse HTML, and I always yell at them for trying to do something that's very hard themselves, and tell them to use HTML::Parser.

I intended to do that here.

I tried to break tilly's code.

I failed.


So while I still don't necessarily understand why one wouldn't use HTML::Parser, as far as I can see this is clean. Excellent stuff. I'm very impressed.


Replies are listed 'Best First'.
RE (tilly) 2 (not html): Why I like functional programming
by tilly (Archbishop) on Oct 01, 2000 at 08:04 UTC
    HTML::Parser would actually be an awful fit for this problem. If you don't believe it, try to duplicate the functionality the code already has.

    The problem is that the incoming document is not HTML. It is a document in some markup language, some of whose tags look like html, but which isn't really. I don't want to spend time worrying about "broken html" that I am going to just escape. I don't want to worry about valid html that I want to deny. I want to report custom errors. (Hey, why not instead of just denying pre-monks image tags, also give an error with a link to the FAQ?) And I want to include markup tags you won't find in HTML.

    I did a literal escape above using [code] above. I submit that HTML::Parser would not help with that. OK, so that should be <code> for this site, but this site would want to implement a couple of escaped I didn't. For instance the following handler would be defined for this site for [ (assuming that $site_base was and hoping that I don't make any typos):

    use URI::Escape qw(uri_escape); sub { my $t_ref = shift; if ($$t_ref =~ /\G([^\|\]]+)(?:\|(\|[^\|\]]+))?\]/g) { my $node_ref = "$site_base?node=" . uri_escape($1); my $node_name = encode_entities($2 || $1); return qq(<a href="$node_ref">$node_name</a>); } else { return show_err("Incomplete node link?"); } }
    And, of course, given $node_id there is probably a function get_node_name available. And we have that lastnode_id the site keeps track of. So we also need a handler for [:// to link by ID, and that would be generated by something like this:
    sub ret_link_by_id { my $tracking = shift; # eg "&lastnode_id=23453" sub { my $t_ref = shift; if ($$t_ref =~ /\G([1-9]\d*)(?:\|([^\|\]]+))?\]/g) { my $node_id = $1; my $name = $2 || get_node_name($node_id); my $node_name = encode_entities($name); my $url = "$site_base?node_id=$node_id$tracking"; return qq(<a href="$url">$node_name</a>); } else { return show_err("Incomplete node_id link?"); } } }
    If this still looks to your eyes like a slightly hacked up html spec, let me show you a feature that I dearly wish that this site had. Stop and think about what the following handler for \ does:
    sub { my $t_ref = shift; if ($$t_ref =~ /\G([&\[\]<>\\])/g) { return encode_entities($1); } }
    Do you see it? Consider what would happen to the following string:
    You can link by URL like this: <pre> \<a href=""\><a href=http://www.perlmonks.o +rg/>Perl Monks</a>\</a\> </pre>
    Got it yet?

    No more looking up those pesky escape codes! :-)

    My apologies for using you as a foil, but you just let me illustrate Tom's point perfectly. All of the stuff I am saying is obvious to anyone who has played with functional techniques, but since you haven't you are simply unable to see the amazing potential inherent in this method of code organization. And I happen to know that you are not a bad programmer, but this was a blind spot for you.

    Time to put down the pot, we aren't boiling now. This is a frying pan and I feel like an omelette. :-)

      No, you're absolutely right that HTML::Parser in and of itself wouldn't do a good job for this specific problem. My point in posting was really twofold:
      • Writing a parser like this is very very difficult to get right, and usually it's better to find an existing tool that's already been stress-tested. You got it right because you know what you're doing, but I doubt many others would be able to execute like that.
      • Ovid's initial problem, which apparently was the seed for your post, was tailor-made for HTML::Parser.
      As far as functional programming goes, I'm not a stranger (I just recently replaced Perl code to walk two trees and find differences with compiled ML because it was faster and more conceptually simple), and I certainly support seeing more functional Perl. I'm not necessarily convinced that functional techniques helped in this particular program that much; my claim is that it worked so well because of the strength of the programmer. However, I do appreciate the elegance of the solution. But I continue to submit that your average Perl programmer would botch this problem subtly, and it would make more sense for them to use some pre-rolled solution.


        My apologies for misunderstanding what problem you thought HTML::Parser would be a good fit for.

        While I appreciate the compliment, I do think that it was functional programming which made this work.

        If I am a strong programmer, my strength was shown here in picking the right style for the job, not in executing that style in an amazing way. I agree that your average Perl programmer would botch this job, probably horribly. I also submit that your average competent Perl programmer would also botch it - possibly subtly and probably not. I know that I personally wouldn't know how to tackle this in an OO style, and could not come up with a solution I would like in a procedural style.

        However I think that most decent Perl programmers with exposure to functional techniques, when given the core function would have little difficulty in adding a series of handlers and getting it right from there on in. And that core function is easy to get right because it is doing something conceptually simple. (Scanning for handled stuff, and escaping anything that doesn't wind up being handled.) The barrier here is conceptual, not technical.

        As for whether HTML::Parser is a good fit, that depends on how you read Ovid's problem. It would be a good fit if you wanted to just strip out disallowed tags. It would be a bad fit if you wanted to escape them again, leaving text untouched and add error messages as I did. It would be a really bad fit if this parsing piece was going to be extended (as the above was) to allow a number of custom markup symbols to be used.

        Personally I always get irritated at seeing my mistakes be silent. So while HTML::Parser might solve some spec, it would not give a solution that could be extended nicely. And I am not sure it would solve Ovid's problem to the satisfaction of future requests that might come in. But this does.

        More amusing handlers. The right one for :// can autodetect urls. One for @ can do the same for email addresses. The main loop needs to be changed to a different class, but "\n" gives you the ability to have a newbie mode. (Note, look for leading spaces on the next line and turn them into &nbsp; please.)

        So you see, a ton of different requests can be satisfied, and when the rules conflict (eg don't look for URLs in formatted code) there is already a good resolution.

        Looking at this, I don't think I could do all of that (particularly the switching) using any other technique I know. Without having seen functional, I would be lost.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://34793]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2023-09-24 03:47 GMT
Find Nodes?
    Voting Booth?

    No recent polls found