I have a two PHP scripts that I am using for hadoop streaming, crunching some JSON data. For various reasons, I am trying to translate these scripts to Perl, but having a tough time. I actually managed to get the mapping script translated (I kinda cheated and used a regex to get around dealing with processing JSON data) but the reduce script is more complex. For starters, here's the sorted output from the mapping script (each line is a key, followed by a tab, followed by a value with a newline ending each line):
36dc0d7d0ac25ce60898c36ca135fbbd [[12051,840,501,33],{"23602":2
+2}]
4c38528ffe96a15c90e8cfcaaad048e3 [[13308,124,-1,62],{"8002":12}
+]
5557a6bed3793133754d288e2b58763a [[2197,840,751,6],{"16501":1}]
5a9c1f69434c1a8b1d7880ef03ae4264 [[7525,616,-1,14347],{"24902":
+37}]
87f63173118df680a4c1d63b7953faf3 [[2765,458,-1,11937],{"3102":1
+5}]
901d1a5dbd4ed87fd68db2513fb29762 [[1828,124,-1,63],{"8002":379}
+]
c23a2b2c10af8af96b1b24ddd4cc53d4 [[62,840,820,38],{"16801":303}
+]
d7af9cd8573ecbec6d42e453439e3e0f [[4680,124,-1,63],{"1012":1896
+}]
d93adab6b345608d38ea84811012dce8 [[114,840,819,48],{"22502":322
+,"8002":3}]
ffd50dd8b4986f40634d6b5925dc04c6 [[6089,840,803,5],{"1252":1}]
And here is the PHP code that does the reducing:
#!/usr/bin/php
$data = array();
while (($line = fgets(STDIN)) !== false)
{
list($key,$value) = explode("\t",trim($line));
$value =& json_decode($value);
$value[1] = get_object_vars($value[1]);
if( isset($data[$key]) )
{
foreach( $value[1] as $k=>$v )
{
$data[$key][1][$k] += $v;
}
}
else
{
$data[$key] = $value;
}
}
foreach( $data as $key => $value )
{
echo $key ."\t". json_encode( array($key=>$value) ) ."\n";
}
Particularly, this is the part I can't figure out how to translate:
$value =& json_decode($value);
$value[1] = get_object_vars($value[1]);
I placed a couple of echos in that PHP code to see what values wind up in $value and $value1, and here's what they get with the first line of the input data:
Input line:
36dc0d7d0ac25ce60898c36ca135fbbd {"36dc0d7d0ac25ce60898c36ca135
+fbbd":[[12051,840,501,33],{"23602":22}]}
$value before the json_decode call : [[12051,840,501,33],{"23602":22}]
$value after json_decode call (output via print_r):
Array
(
[0] => Array
(
[0] => 12051
[1] => 840
[2] => 501
[3] => 33
)
[1] => Array
(
[23602] => 22
)
)
$value[1] output via print_r:
Array
(
[23602] => 22
)
I can see that json_decode call basically takes the value and converts it to a multidimensional array and assigns them to $value. I don't quite understand what the get_objects_var call does to $value1 but the end result is that it contains another array containing a key->value pair.
My question is, how hard would it be to do the same thing in Perl? I took a look at the JSON module documentation, but didn't understand how to wind up with the same results this PHP code gets. Any takers on giving me a hand with this?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.