in reply to Why a regex *really* isn't good enough for HTML and XML, even for "simple" tasks
Your argument is utterly unconvincing. People use regex to extract from HTML documents because it works. They wouldn't use a regex to extract the urls from the document you provided because it wouldn't work.
The real reason not to create a half-assed parser (using regex or otherwise) is this phrase we've all heard: "But it worked yesterday." This is what you'll get with a hacked up solution because it's going to be far less resilient to change and a lot more expensive to maintain than one using a proper parser.
Also, there's a good chance you'll spend far more time developing the hacked up solution as you keep finding corner cases.
Update: Replaced claim the presented task isn't a simple task with an explanation of why isn't one. Sorry, this was done within seconds of posting.
|
---|