Re: Why a regex *really* isn't good enough for HTML, even for "simple" tasks

in reply to Why a regex *really* isn't good enough for HTML and XML, even for "simple" tasks

Your argument is utterly unconvincing. People use regex to extract from HTML documents because it works. They wouldn't use a regex to extract the urls from the document you provided because it wouldn't work.

The real reason not to create a half-assed parser (using regex or otherwise) is this phrase we've all heard: "But it worked yesterday." This is what you'll get with a hacked up solution because it's going to be far less resilient to change and a lot more expensive to maintain than one using a proper parser.

Also, there's a good chance you'll spend far more time developing the hacked up solution as you keep finding corner cases.

Update: Replaced claim the presented task isn't a simple task with an explanation of why isn't one. Sorry, this was done within seconds of posting.

Comment on Re: Why a regex really isn't good enough for HTML, even for "simple" tasks

Replies are listed 'Best First'.
Re^2: Why a regex really isn't good enough for HTML, even for "simple" tasks by haukex (Archbishop) on May 09, 2020 at 08:59 UTC
Your argument is utterly unconvincing. Noone would claim that parsing that HTML is a simple task. Except that's not what I said, and people do try to use regexes to extract stuff from HTML all the time. The real reason not to create a half-assed parser (using regex or otherwise) is the following: "But it worked yesterday." A hacked up solution is going to be far less resilient to change and a lot more expensive to maintain than one using a proper parser. Which is exactly the argument I made in Parsing HTML/XML with Regular Expressions. Update: PerlMonks has a preview function; I won't be responding to your ninja edits. The above quotes represent the entirety of your post at my time of posting.	[reply]
Re^3: Why a regex really isn't good enough for HTML, even for "simple" tasks by marto (Cardinal) on May 09, 2020 at 09:05 UTC
Also '..Mark the changed/new content with the word "Update..." from How do I post a question effectively?.	[reply]
Re^3: Why a regex really isn't good enough for HTML, even for "simple" tasks by ikegami (Patriarch) on May 09, 2020 at 09:20 UTC
people do try to use regexes to extract stuff from HTML all the time. I know. And like I said, your argument isn't going to convince a single one of them to stop. They will see their tasks as simple tasks and yours as complex, and you completely failed to show why regex shouldn't be used for simple tasks despite your claims. Perhaps you should add an explanation as to why they shouldn't be used for simple tasks?	[reply]
Re^4: Why a regex really isn't good enough for HTML, even for "simple" tasks by haukex (Archbishop) on May 09, 2020 at 09:37 UTC
I know. And like I said, your argument isn't going to convince a single one of them to stop. They will see their tasks as simple tasks and yours as complex, and you completely failed to show why regex shouldn't be used for simple tasks despite your claims. I see your point now, and I guess that means your initial post could have been something along the lines of "I think your argument might be less effective because people will see their tasks as simple tasks and yours as complex, so how about adding an explanation why regexes still shouldn't be used?". Instead, you chose to be rude. Update: Once again, the above quote represents the entirety of your node at the time I saw it and started composing my reply.	[reply]
Re^4: Why a regex really isn't good enough for HTML, even for "simple" tasks by ikegami (Patriarch) on May 09, 2020 at 09:31 UTC
Downvoting and ignoring constructive criticism isn't going to convince the people you are supposedly trying to help. When I say it won't convince them, I mean it has always failed to convince them before. I've seen people have made the same argument countless times to no avail. The best results I've seen have been from showing them it's actually easier to do it right. That even appears to be the message you are trying to send with the examples, so it's really just a question of how you frame the problem!	[reply]
Re^5: Why a regex really isn't good enough for HTML, even for "simple" tasks (updated) by haukex (Archbishop) on May 09, 2020 at 09:40 UTC
Re^3: Why a regex really isn't good enough for HTML, even for "simple" tasks by ikegami (Patriarch) on May 09, 2020 at 09:03 UTC
Except that's not what I said You said: "Why a regex really isn't good enough for HTML, even for "simple" tasks". So yeah, you did. Which is exactly the argument I made in Parsing HTML/XML with Regular Expressions. ok, but it's what you said here I'm commenting on.	[reply]
Re^4: Why a regex really isn't good enough for HTML, even for "simple" tasks by haukex (Archbishop) on May 09, 2020 at 09:36 UTC
Noone would claim that parsing that HTML is a simple task. Since you're active on both PerlMonks and StackOverflow, you must be aware of the fact that scores of people try to pull stuff from HTML using regexes. My node title is what it is as a response to that. You said: "Why a regex really* isn't good enough for HTML, even for "simple" tasks". So yeah, you did.* Read the what I wrote again keeping in mind what I said above and maybe you'll see that your interpretation of what I said is not what I meant. Unfortunately, it seems that once again your drive to maintain that you are correct appears to be stronger than your drive to be reasonable~~, so I'm out~~.	[reply]

In Section Meditations