Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I've been using XML::Fast to process XML files for some time and successfully. However, the code was moved to a newer machine and has stopped working in some circumstances. A difference between the machines is XML::Fast version, 0.11 on original machine (working) and 0.17 on new machine (not working). When no other changes are made but to upgrade to 0.17 on the old machine it also stops working.

The error I'm getting is:

Failed to encode 2017-9-21T08-49-17.XML to JSON for indexing - malform +ed or illegal unicode character in string [�ndby IF], cannot c +onvert to JSON at xx.pm line 1827.

The XML file comes from a 3rd party and is ISO-8859-1 encoded. The bit it is complaining about is <Value>Br<F8>ndby IF</Value>. A cut down version of the XML which fails is:

<?xml version="1.0" encoding="ISO-8859-1"?> <xx feedtype="delta"><Timestamp CreatedTime="2017-09-21T06:49:17" Time +Zone="GMT"/><Value>Brøndby IF</Value></xx>

The code which is now failing is:

use Cpanel::JSON::XS; use XML::Fast; sub esIndexFile2 { my ($self, $file) = @_; my $xml = do { local $/ = undef; open (my $fh, "<:encoding(ISO-8859-1)", $file) or die "Failed +to open $file - $!"; <$fh>; }; $xml =~ s/^(?:.*\n)//; # remove first line - the encoding lin +e my $hash; eval { $hash = xml2hash $xml; }; if (my $ev = $@) { warn("Failed to parse file $file for indexing - $@ - SKIPPING" +); return; } my $json = eval { encode_json($hash); # <------------ fails here }; if (my $ev = $@) { $self->logwarn("Failed to encode $file to JSON for indexing - +$@ - SKIPPING"); return; } return 1; }

The changes file for XML::Fast is not too helpful. I have discovered adding utf8decode => 1 to the xml2hash makes it work now but I don't really understand why. I am doing anything wrong here? What might have changed in XML::Fast to cause this to happen?


In reply to Problem upgrading XML::Fast from 0.11 to 0.17 by mje

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-18 23:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found