Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

how to get HTML file after parsing with HTML::Normalize

by Anonymous Monk
on Mar 13, 2008 at 09:06 UTC ( [id://673927]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I get parsed HTML from HTML::Normalize module?

my code as follows

use HTML::Normalize; my $norm = HTML::Normalize->new (); open(IN,"index.html"); $dirty = <IN>; close(IN); my $cleanHtml = $norm->cleanup (-html => $dirtyHtml); open(OUT,">output.html") or die "Cannot open file\n"; print OUT $cleanHtml; close(OUT);

I'm opening a html file and cleaning the html tags and again I need the HTML file after parsing

Replies are listed 'Best First'.
Re: how to get HTML file after parsing with HTML::Normalize
by GrandFather (Saint) on Mar 13, 2008 at 09:58 UTC

    I suspect your problem is the $dirty = <IN>; line which only reads a single line from the input file. There are many ways to fix that, but the technique I tend to use is:

    my $dirty = do {local $/; <IN>};

    which resets the line end terminator ($/) so that the diamond operator (<>)reads the whole file in one hit.


    Perl is environmentally friendly - it saves trees
Re: how to get HTML file after parsing with HTML::Normalize
by haoess (Curate) on Mar 13, 2008 at 10:06 UTC

    Ähm, you want the contents of the $cleanHtml variable? Or you want to know how to read the contents of a file?!

    Your code revised:

    use warnings; use strict; use HTML::Normalize; my $norm = HTML::Normalize->new; open(my $in, '<', 'index.html') or die "Could not read 'index.html': $ +!"; my $dirty = do { local $/; <$in> }; close $in or die "Could not close: $!"; my $cleanHtml = $norm->cleanup(-html => $dirty); open(my $out, '>', 'output.html') or die "Could not open 'output.html' +: $!"; print $out $cleanHtml; close $out or die "Could not close: $!";

    -- Frank

      Thanks Frank

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://673927]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-04-18 02:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found