Yes indeed it should be UTF-8 but we've already established that the file has "strange" (being polite) encoding...
$| = 1;
$/ = undef;
print "Reading JSON file";
open my $fh, '<:encoding(UTF-8)', '../data/publicextract.charity.json'
+ or die "Unable to read Charity JSON File";
my $data = <$fh>;
close $fh;
print "...done\n";
$data =~ s/^\x{feff}//; # Strip off BOM
print "Decoding JSON file";
my $js = decode_json $data; # line 24
print "...done\n";
This takes about 10 minutes to read the 462Mb JSON file then fails with
Decoding JSON fileWide character in subroutine entry at import.pl line 24
Given the time taken to open the file in UTF-8 and the error, I am thinking there is some nasty encoding hidden somewhere in this file
UPDATE
Changing the encoding like so
print "Reading JSON file";
open my $fh, '<', '../data/publicextract.charity.json' or die "Unable
+to read Charity JSON File";
my $data = <$fh>;
close $fh;
print "...done\n";
$data =~ s/^\357\273\277//; # Strip off BOM
takes about 5 minutes to open the file but gives the strange error Decoding JSON fileKilled
"Strange" because the error doesn't include at import.pl line 24!
Another UPDATE
It seems I might be running out of memory...380400 records in the JSON file seems to be too much... |