Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: HTML::HTML5::Parser weirdness

by Corion (Pope)
on Feb 23, 2020 at 16:57 UTC ( #11113348=note: print w/replies, xml ) Need Help??


in reply to HTML::HTML5::Parser weirdness

You don't show us how you populate $filename.

Maybe the file does not exist, or $filename has whitespace at the end. When I try a mock program without an existing file, I get a very similar output:

#!perl use strict; use warnings; use XML::LibXML; use HTML::HTML5::Parser; my $p = HTML::HTML5::Parser->new(); my $doc = $p->parse_file('file:///this/doesnotexist.html', { ignore_http_response_code => 1, }, ); print $doc->toString;

Output:

Use of uninitialized value $c_type in pattern match (m//) at /home/cor +ion/perl5/lib/perl5/HTML/HTML5/Parser.pm line 59. <?xml version="1.0" encoding="windows-1252"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>

If you can show us a short and self-contained example (SSCCE), then that would remove a lot of guesswork and maybe helps us find the root of the problem better.

Replies are listed 'Best First'.
Re^2: HTML::HTML5::Parser weirdness
by djh (Novice) on Feb 23, 2020 at 20:34 UTC

    Thanks for reading my post and taking the trouble to reply.

    I did show you exactly how I populated $filename of course:

    for my $filename (@files)

    but I expect you meant how I populated @files, which was using readdir

    opendir my $dir, $BASE_DIR or die "Cannot open $BASE_DIR directory: $! +"; my @all_files = readdir $dir; closedir $dir; my @files = sort grep { $_ =~ /\-00:/ } @all_files;

    I didn't think that was important, since I specifically said I'd checked the file existed and I even posted some of its contents. And the fact that the loop works for all the other files in that directory is a strong hint there's no whitespace problems or whatever.

    But the fact that you got a similar error with a non-existent file suggests to me that the problem is in the module, which was why I looked for <head/> in /usr/lib/perl5. So I'll go and look further to see if I can isolate where that string is coming from.

    While I appreciate the benefits of SSCCE, I think the effort I would need to construct one in this case outweighs the benefits. But I may do so if I'm still stuck after a while.

    PS Why does perlmonks format code at column 70? I could vaguely understand column 72 if I was a punched-card FORTRAN programmer but I use an 80-column terminal and try to stay within that!

      While I appreciate the benefits of SSCCE, I think the effort I would need to construct one in this case outweighs the benefits.

      Because very often the process of creating a SSCCE will find the problem for you. In the process of creating a SSCCE you are forced to examine your assumptions. Often you will find they aren't correct. You also make it easy for us to reproduce your problem - why should we do all the work? See I know what I mean. Why don't you?.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      > Why does perlmonks format code at column 70?

      You can configure that in the Display Settings.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Ah, thanks! :) 70 still seems a strange default, but there's plenty of bigger things to worry about :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11113348]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2020-10-29 11:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (270 votes). Check out past polls.

    Notices?