Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Why a regex *really* isn't good enough for HTML, even for "simple" tasks

by haukex (Bishop)
on May 08, 2020 at 18:09 UTC ( #11116590=note: print w/replies, xml ) Need Help??


in reply to Re: Why a regex *really* isn't good enough for HTML, even for "simple" tasks
in thread Why a regex *really* isn't good enough for HTML and XML, even for "simple" tasks

Actually, as the page itself contains "confusing" (to Chrome) information, this is somewhat explainable. The HTML is XML, but it later declares a Content-Type of text/html. Changing that to Content-Type text/xhtml makes (WWW::Mechanize::)Chrome report the correct links.

Interesting, thanks! According to several sources on the W3C website, the correct MIME type is application/xhtml+xml, so I've changed that.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11116590]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2020-11-27 06:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?