UPDATE:
Just for completeness for any that may come after, this issue has been fixed in version 3.03:
- fix a bug introduced by a perl bug workaround that would cause
incremental parsing to fail with a sv_chop panic.
UPDATE:
I tried replacing JSON::XS with JSON::PP and got the following error message instead:
unexpected end of string while parsing JSON string, at character offset 5 (before "(end of string)") at parse_json.pl line 40.
Good morning!
I am working through a little script that will incrementally parse a multi-gigabyte JSON file using the JSON::XS module. I am using the incremental parsing example provided on CPAN at the bottom of this example section here and I am running into errors when I try to parse the JSON data.
In the example test case below, Perl is aborting with a panic: sv_chop error after the first document. What is causing this particular error? Am I doing something wrong?
EDIT: add actual error message
End of document 1
panic: sv_chop ptr=4488b1, start=422038, end=4221f8 at parse_json.pl l
+ine 40, <DATA> chunk 886.
I am using Strawberry Perl 5.024 x64 on Windows 8.1 with JSON::XS version 3.02
Example program:
#!/usr/bin/perl
use 5.024;
use JSON::XS;
use strict;
use warnings;
#################
# CHUNK_LENGTH is intentionally tiny to highlight the error
my $CHUNK_LENGTH = \1;
#################
###################
## MAIN
{
my $json = new JSON::XS;
my $buffer;
my $statement_count = 0;
# Incrementally parse an array of JSON objects
local $/ = $CHUNK_LENGTH;
# first parse the initial "["
INITIAL_PARSE_LOOP: while ( 1 ) {
$buffer = <DATA>;
$json->incr_parse($buffer); # void context, so no parsing
# Exit the loop once we found the initial "[".
# In essence, we are (ab-)using the $json object as a simple scala
+r
# we append data to.
last INITIAL_PARSE_LOOP if $json->incr_text =~ s/^.*?\[//msx;
}
# now we have the skipped the initial "[", so continue
# parsing all the elements.
PARSE_LOOP: while ( 1 ) {
# clean up whitespace and padding
$json->incr_text =~ s/^\s*(.+)\n/$1/gms;
# in this loop we read data until we got a single JSON object
STATEMENT_PARSE_LOOP: while ( 1 ) {
if ( my $obj = $json->incr_parse ) {
say "End of document ".++$statement_count;
last STATEMENT_PARSE_LOOP;
}
# add more data
$buffer = <DATA>;
$json->incr_parse($buffer); # void context, so no parsing
}
# in this loop we read data until we either found and parsed the
# separating "," between elements, or the final "]"
CLEAN_UP_LOOP: while ( 1 ) {
# first skip whitespace
$json->incr_text =~ s/^\s*//;
# if we find "]", we are done with the file
last PARSE_LOOP if $json->incr_text =~ s/^\]//;
# if we find ",", we can continue with the next element
last CLEAN_UP_LOOP if $json->incr_text =~ s/^,//;
# if we find anything else, we have a parse error!
if ( length $json->incr_text ) {
die "parse error near ".$json->incr_text;
}
# else add more data
$buffer = <DATA>;
$json->incr_parse($buffer); # void context, so no parsing
}
}
}
__DATA__
[
{
"name": "xxxxxxxxxxxxxxx",
"id": "xxxxxxxxxxxxxx",
"previousBalance": "xxxxx",
"currentMonth": "xxxxxxx",
"total": "xxxxxxxx",
"billingDate": "xxxxxxxxxxxxxx",
"billingPeriod": "xxxxxxxxxxxxxxxxxxxxxxx",
"invoiceDate": "xxxxxxxxxxx",
"dueDate": "xxxxxxxxxxx",
"autopay": false,
"address": {
"line1": "xxxxxxxxxxxxx",
"line2": null,
"city": "xxxxxxxxxxxx",
"state": "xx",
"zip": "xxxxxxxxxxx"
}
},
{
"name": "xxxxxxxxxxxxxxx",
"id": "xxxxxxxxxxxxxx",
"previousBalance": "xxxxx",
"currentMonth": "xxxxxxx",
"total": "xxxxxxxx",
"billingDate": "xxxxxxxxxxxxxx",
"billingPeriod": "xxxxxxxxxxxxxxxxxxxxxxx",
"invoiceDate": "xxxxxxxxxxx",
"dueDate": "xxxxxxxxxxx",
"autopay": false,
"address": {
"line1": "xxxxxxxxxxxxx",
"line2": null,
"city": "xxxxxxxxxxxx",
"state": "xx",
"zip": "xxxxxxxxxxx"
}
}
]
Thank you for your time.