I need to store a multi-dimensional associative array of data in a flat file for caching purposes. I might occasionally come across the need to convert it to JSON for use in
just an fyi -- if you want to serialize your data to something easy to read and understand like JSON but with more compression and higher performance, you should check out messagepack.
Before you make your final decision, be aware that the JSON format is not safe for associative arrays - json_decode()
will return them as objects instead:
$config = array(
'Frodo' => 'hobbit',
'Gimli' => 'dwarf',
'Gandalf' => 'wizard',
);
print_r($config);
print_r(json_decode(json_encode($config)));
Output is:
Array
(
[Frodo] => hobbit
[Gimli] => dwarf
[Gandalf] => wizard
)
stdClass Object
(
[Frodo] => hobbit
[Gimli] => dwarf
[Gandalf] => wizard
)
I've tested this very thoroughly on a fairly complex, mildly nested multi-hash with all kinds of data in it (string, NULL, integers), and serialize/unserialize ended up much faster than json_encode/json_decode.
The only advantage json have in my tests was it's smaller 'packed' size.
These are done under PHP 5.3.3, let me know if you want more details.
Here are tests results then the code to produce them. I can't provide the test data since it'd reveal information that I can't let go out in the wild.
JSON encoded in 2.23700618744 seconds
PHP serialized in 1.3434419632 seconds
JSON decoded in 4.0405561924 seconds
PHP unserialized in 1.39393305779 seconds
serialized size : 14549
json_encode size : 11520
serialize() was roughly 66.51% faster than json_encode()
unserialize() was roughly 189.87% faster than json_decode()
json_encode() string was roughly 26.29% smaller than serialize()
// Time json encoding
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
json_encode( $test );
}
$jsonTime = microtime( true ) - $start;
echo "JSON encoded in $jsonTime seconds<br>";
// Time serialization
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
serialize( $test );
}
$serializeTime = microtime( true ) - $start;
echo "PHP serialized in $serializeTime seconds<br>";
// Time json decoding
$test2 = json_encode( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
json_decode( $test2 );
}
$jsonDecodeTime = microtime( true ) - $start;
echo "JSON decoded in $jsonDecodeTime seconds<br>";
// Time deserialization
$test2 = serialize( $test );
$start = microtime( true );
for($i = 0; $i < 10000; $i++) {
unserialize( $test2 );
}
$unserializeTime = microtime( true ) - $start;
echo "PHP unserialized in $unserializeTime seconds<br>";
$jsonSize = strlen(json_encode( $test ));
$phpSize = strlen(serialize( $test ));
echo "<p>serialized size : " . strlen(serialize( $test )) . "<br>";
echo "json_encode size : " . strlen(json_encode( $test )) . "<br></p>";
// Compare them
if ( $jsonTime < $serializeTime )
{
echo "json_encode() was roughly " . number_format( ($serializeTime / $jsonTime - 1 ) * 100, 2 ) . "% faster than serialize()";
}
else if ( $serializeTime < $jsonTime )
{
echo "serialize() was roughly " . number_format( ($jsonTime / $serializeTime - 1 ) * 100, 2 ) . "% faster than json_encode()";
} else {
echo 'Unpossible!';
}
echo '<BR>';
// Compare them
if ( $jsonDecodeTime < $unserializeTime )
{
echo "json_decode() was roughly " . number_format( ($unserializeTime / $jsonDecodeTime - 1 ) * 100, 2 ) . "% faster than unserialize()";
}
else if ( $unserializeTime < $jsonDecodeTime )
{
echo "unserialize() was roughly " . number_format( ($jsonDecodeTime / $unserializeTime - 1 ) * 100, 2 ) . "% faster than json_decode()";
} else {
echo 'Unpossible!';
}
echo '<BR>';
// Compare them
if ( $jsonSize < $phpSize )
{
echo "json_encode() string was roughly " . number_format( ($phpSize / $jsonSize - 1 ) * 100, 2 ) . "% smaller than serialize()";
}
else if ( $phpSize < $jsonSize )
{
echo "serialize() string was roughly " . number_format( ($jsonSize / $phpSize - 1 ) * 100, 2 ) . "% smaller than json_encode()";
} else {
echo 'Unpossible!';
}
Check out the results here (sorry for the hack putting the PHP code in the JS code box):
http://jsfiddle.net/newms87/h3b0a0ha/embedded/result/
RESULTS: serialize()
and unserialize()
are both significantly faster in PHP 5.4 on arrays of varying size.
I made a test script on real world data for comparing json_encode vs serialize and json_decode vs unserialize. The test was run on the caching system of an in production e-commerce site. It simply takes the data already in the cache, and tests the times to encode / decode (or serialize / unserialize) all the data and I put it in an easy to see table.
I ran this on PHP 5.4 shared hosting server.
The results were very conclusive that for these large to small data sets serialize and unserialize were the clear winners. In particular for my use case, the json_decode and unserialize are the most important for the caching system. Unserialize was almost an ubiquitous winner here. It was typically 2 to 4 times (sometimes 6 or 7 times) as fast as json_decode.
It is interesting to note the difference in results from @peter-bailey.
Here is the PHP code used to generate the results:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
function _count_depth($array)
{
$count = 0;
$max_depth = 0;
foreach ($array as $a) {
if (is_array($a)) {
list($cnt, $depth) = _count_depth($a);
$count += $cnt;
$max_depth = max($max_depth, $depth);
} else {
$count++;
}
}
return array(
$count,
$max_depth + 1,
);
}
function run_test($file)
{
$memory = memory_get_usage();
$test_array = unserialize(file_get_contents($file));
$memory = round((memory_get_usage() - $memory) / 1024, 2);
if (empty($test_array) || !is_array($test_array)) {
return;
}
list($count, $depth) = _count_depth($test_array);
//JSON encode test
$start = microtime(true);
$json_encoded = json_encode($test_array);
$json_encode_time = microtime(true) - $start;
//JSON decode test
$start = microtime(true);
json_decode($json_encoded);
$json_decode_time = microtime(true) - $start;
//serialize test
$start = microtime(true);
$serialized = serialize($test_array);
$serialize_time = microtime(true) - $start;
//unserialize test
$start = microtime(true);
unserialize($serialized);
$unserialize_time = microtime(true) - $start;
return array(
'Name' => basename($file),
'json_encode() Time (s)' => $json_encode_time,
'json_decode() Time (s)' => $json_decode_time,
'serialize() Time (s)' => $serialize_time,
'unserialize() Time (s)' => $unserialize_time,
'Elements' => $count,
'Memory (KB)' => $memory,
'Max Depth' => $depth,
'json_encode() Win' => ($json_encode_time > 0 && $json_encode_time < $serialize_time) ? number_format(($serialize_time / $json_encode_time - 1) * 100, 2) : '',
'serialize() Win' => ($serialize_time > 0 && $serialize_time < $json_encode_time) ? number_format(($json_encode_time / $serialize_time - 1) * 100, 2) : '',
'json_decode() Win' => ($json_decode_time > 0 && $json_decode_time < $serialize_time) ? number_format(($serialize_time / $json_decode_time - 1) * 100, 2) : '',
'unserialize() Win' => ($unserialize_time > 0 && $unserialize_time < $json_decode_time) ? number_format(($json_decode_time / $unserialize_time - 1) * 100, 2) : '',
);
}
$files = glob(dirname(__FILE__) . '/system/cache/*');
$data = array();
foreach ($files as $file) {
if (is_file($file)) {
$result = run_test($file);
if ($result) {
$data[] = $result;
}
}
}
uasort($data, function ($a, $b) {
return $a['Memory (KB)'] < $b['Memory (KB)'];
});
$fields = array_keys($data[0]);
?>
<table>
<thead>
<tr>
<?php foreach ($fields as $f) { ?>
<td style="text-align: center; border:1px solid black;padding: 4px 8px;font-weight:bold;font-size:1.1em"><?= $f; ?></td>
<?php } ?>
</tr>
</thead>
<tbody>
<?php foreach ($data as $d) { ?>
<tr>
<?php foreach ($d as $key => $value) { ?>
<?php $is_win = strpos($key, 'Win'); ?>
<?php $color = ($is_win && $value) ? 'color: green;font-weight:bold;' : ''; ?>
<td style="text-align: center; vertical-align: middle; padding: 3px 6px; border: 1px solid gray; <?= $color; ?>"><?= $value . (($is_win && $value) ? '%' : ''); ?></td>
<?php } ?>
</tr>
<?php } ?>
</tbody>
</table>
If you are caching information that you will ultimately want to "include" at a later point in time, you may want to try using var_export. That way you only take the hit in the "serialize" and not in the "unserialize".
Depends on your priorities.
If performance is your absolute driving characteristic, then by all means use the fastest one. Just make sure you have a full understanding of the differences before you make a choice
serialize()
you need to add extra parameter to keep UTF-8 characters untouched: json_encode($array, JSON_UNESCAPED_UNICODE)
(otherwise it converts UTF-8 characters to Unicode escape sequences).__sleep()
and __wakeup()
with JSONPHP>=5.4
you can implement JsonSerializable to change this behavior).And there's probably a few other differences I can't think of at the moment.
A simple speed test to compare the two
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);
// Time json encoding
$start = microtime(true);
json_encode($testArray);
$jsonTime = microtime(true) - $start;
echo "JSON encoded in $jsonTime seconds\n";
// Time serialization
$start = microtime(true);
serialize($testArray);
$serializeTime = microtime(true) - $start;
echo "PHP serialized in $serializeTime seconds\n";
// Compare them
if ($jsonTime < $serializeTime) {
printf("json_encode() was roughly %01.2f%% faster than serialize()\n", ($serializeTime / $jsonTime - 1) * 100);
}
else if ($serializeTime < $jsonTime ) {
printf("serialize() was roughly %01.2f%% faster than json_encode()\n", ($jsonTime / $serializeTime - 1) * 100);
} else {
echo "Impossible!\n";
}
function fillArray( $depth, $max ) {
static $seed;
if (is_null($seed)) {
$seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
}
if ($depth < $max) {
$node = array();
foreach ($seed as $key) {
$node[$key] = fillArray($depth + 1, $max);
}
return $node;
}
return 'empty';
}