Is it possible to calculate sum of json values grouped by json keys?
Mysql version is 5.7.17 on Google cloud sql.
Example_1: A short exampl
SQL for the "short example":
SELECT col1,
JSON_OBJECT('key1', SUM(value1), 'key2', SUM(value2)) AS col2
FROM
(SELECT col1,
JSON_EXTRACT(col2, '$.key1') AS value1,
JSON_EXTRACT(col2, '$.key2') AS value2
FROM tbl) subq
GROUP BY col1;
I believe that something like this could work.
SELECT SUM(col2->>"$.key1"), SUM(col2->>"$.key2") FROM your_table GROUP BY col1
Solution for Example_3:
DROP TABLE IF EXISTS jsondata;
CREATE TABLE jsondata (json JSON, col varchar(11));
INSERT INTO jsondata VALUES
('[{"key_name" : "key1", "key_value" : 1}, {"key_name" : "key2", "key_value" : 3}]', 'aaa'),
('[{"key_name" : "key1", "key_value" : 0}, {"key_name" : "key3", "key_value" : 2}]', 'bbb'),
('[{"key_name" : "key1", "key_value" : 50}, {"key_name" : "key2", "key_value" : 0}]', 'aaa');
DROP FUNCTION IF EXISTS json_sum_by_col;
CREATE FUNCTION json_sum_by_col(col varchar(100)) RETURNS JSON
BEGIN
DECLARE i INT DEFAULT 0;
DECLARE done INT DEFAULT FALSE;
DECLARE select_values JSON;
DECLARE temp_result JSON;
DECLARE json_result JSON DEFAULT '[]';
DECLARE temp_key varchar(11);
DECLARE temp_value int;
DECLARE curs CURSOR FOR SELECT json FROM jsondata WHERE jsondata.col = col;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN curs;
read_loop: LOOP
SET i = 0;
FETCH curs INTO select_values;
IF done THEN
LEAVE read_loop;
END IF;
WHILE i < JSON_LENGTH(select_values) DO
-- extract key and value for i element
SET temp_key = JSON_EXTRACT(JSON_EXTRACT(select_values, CONCAT('$[',i,']')), '$.key_name');
SET temp_value = JSON_EXTRACT(JSON_EXTRACT(select_values, CONCAT('$[',i,']')), '$.key_value');
-- search json_result for key
SET @search = JSON_SEARCH(json_result, 'one', JSON_UNQUOTE(temp_key));
IF @search IS NOT NULL THEN
-- if exists add to existing value
SET @value_path = JSON_UNQUOTE(REPLACE(@search, 'name', 'value'));
SET temp_value = temp_value + JSON_EXTRACT(json_result, @value_path);
SET json_result = JSON_REPLACE(json_result, @value_path, temp_value);
ELSE
-- else attach it to json_result
SET temp_result = JSON_OBJECT("key_name", JSON_UNQUOTE(temp_key), "key_value", temp_value);
SET json_result = JSON_INSERT(json_result, CONCAT('$[',JSON_LENGTH(json_result),']'), temp_result);
END IF;
SELECT i + 1 INTO i;
END WHILE;
END LOOP;
CLOSE curs;
RETURN json_result;
END;
SELECT col, json_sum_by_col(col) FROM jsondata GROUP BY col;
You can run it here
TL;DR: yes, it can be done without knowing the key names in advance, and none of the alternate data formats has any advantage over the original.
This can be done without knowing the key names in advance but it is painful... basically you have to look at every value in the table to determine the set of distinct keys in the table before you can sum them. Because of this requirement, and the fact that the alternate data formats can all have multiple keys per entry, there is no advantage to using any of them.
Since you have to look for all the distinct keys, it is as easy to do the sums while you are looking for them. This function and procedure together will do that. The function, json_merge_sum
, takes two JSON values and merges them, summing the values where a key appears in both values e.g.
SELECT json_sum_merge('{"key1": 1, "key2": 3}', '{"key3": 1, "key2": 2}')
Output:
{"key1": 1, "key2": 5, "key3": 1}
The function code:
DELIMITER //
DROP FUNCTION IF EXISTS json_merge_sum //
CREATE FUNCTION json_sum_merge(IN j1 JSON, IN total JSON) RETURNS JSON
BEGIN
DECLARE knum INT DEFAULT 0;
DECLARE jkeys JSON DEFAULT JSON_KEYS(j1);
DECLARE kpath VARCHAR(20);
DECLARE v INT;
DECLARE l INT DEFAULT JSON_LENGTH(jkeys);
kloop: LOOP
IF knum >= l THEN
LEAVE kloop;
END IF;
SET kpath = CONCAT('$.', JSON_EXTRACT(jkeys, CONCAT('$[', knum, ']')));
SET v = JSON_EXTRACT(j1, kpath);
IF JSON_CONTAINS_PATH(total, 'one', kpath) THEN
SET total = JSON_REPLACE(total, kpath, JSON_EXTRACT(total, kpath) + v);
ELSE
SET total = JSON_SET(total, kpath, v);
END IF;
SET knum = knum + 1;
END LOOP kloop;
RETURN total;
END
The procedure, count_keys
, performs the equivalent of the GROUP BY
clause. It finds all the distinct values of col1
in the table and then calls json_sum_merge
for each row which has that value of col1
. Note the row select query performs a SELECT ... INTO
a dummy variable so no output is generated, and uses a MIN()
to ensure there is only one result (so that it can be assigned to a variable).
The procedure:
DELIMITER //
DROP PROCEDURE IF EXISTS count_keys //
CREATE PROCEDURE count_keys()
BEGIN
DECLARE finished INT DEFAULT 0;
DECLARE col1val VARCHAR(20);
DECLARE col1_cursor CURSOR FOR SELECT DISTINCT col1 FROM table2;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished=1;
OPEN col1_cursor;
col1_loop: LOOP
FETCH col1_cursor INTO col1val;
IF finished=1 THEN
LEAVE col1_loop;
END IF;
SET @total = '{}';
SET @query = CONCAT("SELECT MIN(@total:=json_sum_merge(col2, @total)) INTO @json FROM table2 WHERE col1='", col1val, "'");
PREPARE stmt FROM @query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SELECT col1val AS col1, @total AS col2;
END LOOP col1_loop;
END
For a slightly larger example:
col1 col2
aaa {"key1": 1, "key2": 3}
bbb {"key1": 4, "key2": 2}
aaa {"key1": 50, "key3": 0}
ccc {"key2": 5, "key3": 1, "key4": 3}
bbb {"key1": 5, "key2": 1, "key5": 3}
CALL count_keys()
produces:
col1 col2
aaa {"key1": 51, "key2": 3, "key3": 0}
bbb {"key1": 9, "key2": 3, "key5": 3}
ccc {"key2": 5, "key3": 1, "key4": 3}
Note I've called the table table2
in the procedure, you will need to edit that (in both queries) to suit.