Yes indeed it should be UTF-8 but we've already established that the file has "strange" (being polite) encoding...
$| = 1;
$/ = undef;
print "Reading JSON file";
open my $fh, '<:encoding(UTF-8)', '../data/publicextract.charity.json'
+ or die "Unable to read Charity JSON File";
my $data = <$fh>;
close $fh;
print "...done\n";
$data =~ s/^\x{feff}//; # Strip off BOM
print "Decoding JSON file";
my $js = decode_json $data; # line 24
print "...done\n";
This takes about 10 minutes to read the 462Mb JSON file then fails with
Decoding JSON fileWide character in subroutine entry at import.pl line 24
Given the time taken to open the file in UTF-8 and the error, I am thinking there is some nasty encoding hidden somewhere in this file
UPDATE
Changing the encoding like so
print "Reading JSON file";
open my $fh, '<', '../data/publicextract.charity.json' or die "Unable
+to read Charity JSON File";
my $data = <$fh>;
close $fh;
print "...done\n";
$data =~ s/^\357\273\277//; # Strip off BOM
takes about 5 minutes to open the file but gives the strange error
Decoding JSON fileKilled
"Strange" because the error doesn't include at import.pl line 24!
Another UPDATE
It seems I might be running out of memory...380400 records in the JSON file seems to be too much...
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.