http://qs321.pair.com?node_id=889826


in reply to Re^2: regex in form !regex->regex<-!regex
in thread regex in form !regex->regex<-!regex

perhaps write a last set of substitutions that just reinstates \n and \t for all cases enclosed in <pre>...

You could "reinstate" those tabs - but how would you know which white space was meant to be a tab (whose width depends on settings) and which was meant to be a hard coded specific amount of space. The "reinstate" solution loses information. If that information matters, it isn't going to be a satisfactory solution.

Even though that seems wasteful and stupid, is it worse than invoking a module to do something simple?

Modules are cheap. Your time isn't.

What you want to do is not as simple as it first seems. This is only the first of many complications you are likely to run into. As someone who has studied Mediawiki's markup parsing, I can almost promise you that you will end up with a lot of ugliness if you try to do everything with regexes.

It doesn't take perl a lot to load in a module. It is designed for that sort of thing. It may not even take up extra space on your server. HTML::Parser is such a standard module, some distros and hosting companies just make it available as a matter of course. But even if you have to install it, if learning and using a module will help you do the job better and save you time over full course of your project, you should leap at it.

For what you want to do, getting hands on experience with HTML::Parser will open a lot of doors for you. For one it will give you options about how much HTML you want to integrate into your markup. Using a module to do something simple in a way that gives you expansion room is a very smart move.

I'm not a fan of using modules for every 5 line snippet I can write and test just as easily myself. However, a module like HTML::Parser represents a lot of work done for you testing and debugging a lot of corner cases and gotcha's. I'd also explore CPAN to see if there are are already parsing modules for the kind of blog markup you want to do. Why invent your own markup from the get go (unless this is a learning exercise), if it turns out that you can adapt the work of someone else who is 80% there?