Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Re: Reverse engineering HTML

by THRAK (Monk)
on Jun 14, 2001 at 21:06 UTC ( [id://88506]=note: print w/replies, xml ) Need Help??


in reply to Re: Reverse engineering HTML
in thread Reverse engineering HTML

I have to give a big ++ to Corion for this advice. If you have malformed HTML, running it through Tidy will definately make it far more useable. Although there is currently not a Perl implementation of it (WHAH!), it is very easy to incorporate via a Perl system call. If you have a lot of pages to process, you can build a Perl looping structure and process them one after another. If this is part of an inline process, you can run each file through before you Parse or do whatever with it. I'm currently implementing such an inline Tidy & Perl HTML::Parser process into an existing PHP process. If you have any question, feel free to contact me.

-THRAK
www.polarlava.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://88506]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-24 11:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found