Mysql json based trending tags implementation

后端未结

关注

 2  1588

I am trying to identifying the trending tags (based on maximum hits) on time series using mysql json feature. Below is my table

CREATE TABLE TAG_COUNTER (


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-18 14:37
              
            
            
                                                                       
I don't see a good reason, why you use JSON here. It's also not clear, why you believe that a "nosql schema" within MySQL would do anything better.

What you probably need is something like this:

CREATE TABLE TAG_COUNTER (
    account       varchar(36) NOT NULL,
    time_id       INT NOT NULL,
    tag_name      varchar(50) NOT NULL,
    counter       INT UNSIGNED NOT NULL,
    PRIMARY KEY   (account, time_id, tag_name)
);


This will simplify your queries. The INSERT statement would look like:

INSERT INTO TAG_COUNTER
  (account, time_id, tag_name, counter)
VALUES
  ('google', 2018061023, 'tag1', 1),
  ('google', 2018061023, 'tag2', 1)
ON DUPLICATE KEY UPDATE counter = counter + VALUES(counter);


The SELECT statement might be something like this

SELECT
    SUBSTRING(time_id, 1, 6) AS month,
    tag_name,
    SUM(counter) AS counter_agg
FROM TAG_COUNTER
GROUP BY month, tag_name
ORDER BY month, counter_agg DESC;


Note that I did't try to optimize the table/schema for data size and performance. That would be a different question. But you must see, that the queries are much simpler now.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2021-01-18 14:42
              
            
            
                                                                       
As I have said in comments, I think moving away from JSON is the way to go. However, if you want to keep using JSON, this function (a direct copy of the one in my answer to this question, see an explanation of what it does there) and procedure will do what you want.

DELIMITER //
DROP FUNCTION IF EXISTS json_merge_sum //
CREATE FUNCTION json_sum_merge(IN j1 JSON, IN total JSON) RETURNS JSON
BEGIN
  DECLARE knum INT DEFAULT 0;
  DECLARE jkeys JSON DEFAULT JSON_KEYS(j1);
  DECLARE kpath VARCHAR(30);
  DECLARE v INT;
  DECLARE l INT DEFAULT JSON_LENGTH(jkeys);
  kloop: LOOP
    IF knum >= l THEN
      LEAVE kloop;
    END IF;
    SET kpath = CONCAT('$.', JSON_EXTRACT(jkeys, CONCAT('$[', knum, ']')));
    SET v = JSON_EXTRACT(j1, kpath);
    IF JSON_CONTAINS_PATH(total, 'one', kpath) THEN
      SET total = JSON_REPLACE(total, kpath, JSON_EXTRACT(total, kpath) + v);
    ELSE
      SET total = JSON_SET(total, kpath, v);
    END IF;
    SET knum = knum + 1;
  END LOOP kloop;
  RETURN total;
END //


The procedure is similar to the one in my other answer, in that it finds all the distinct tags associated with a given substring of time_id (specified as a parameter) and sums the values associated with each tag. The individual tags and counts are then written to a temporary table, from which a selection is then made grouping by time period and tag name.

DELIMITER //
DROP PROCEDURE IF EXISTS count_tags //
CREATE PROCEDURE count_tags(IN period VARCHAR(50))
BEGIN
  DECLARE finished INT DEFAULT 0;
  DECLARE timeval VARCHAR(20);
  DECLARE knum, l INT;
  DECLARE jkeys JSON;
  DECLARE time_cursor CURSOR FOR SELECT DISTINCT time_id FROM tag_counter;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished=1;
  CREATE TEMPORARY TABLE tag_counts (Time VARCHAR(20), Tag_Name VARCHAR(30), Tag_count_value INT, INDEX(Time, Tag_Name));
  OPEN time_cursor;
  time_loop: LOOP
    FETCH time_cursor INTO timeval;
    IF finished=1 THEN
      LEAVE time_loop;
    END IF;
    SET @total = '{}';
    SET @query = CONCAT("SELECT MIN(@total:=json_sum_merge(counters, @total)) INTO @json FROM TAG_COUNTER WHERE time_id='", timeval, "'");
    PREPARE stmt FROM @query;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    SET @query = CONCAT('INSERT INTO tag_counts VALUES(', period, ', ?, ?)');
    PREPARE stmt FROM @query;
    SET @timeval = timeval;
    SET l = JSON_LENGTH(@total);
    SET jkeys = JSON_KEYS(@total);
    SET knum = 0;
    key_loop: LOOP
      IF knum >= l THEN
        LEAVE key_loop;
      END IF;
      SET @k = JSON_EXTRACT(jkeys, CONCAT('$[', knum, ']'));
      SET @t = JSON_EXTRACT(@total, CONCAT('$.', @k));
      EXECUTE stmt USING @k, @t;
      SET knum = knum + 1;
    END LOOP key_loop;
    DEALLOCATE PREPARE stmt;
  END LOOP time_loop;
  SELECT Time, Tag_Name, SUM(Tag_count_value) AS Tag_count_value FROM tag_counts GROUP BY Time, Tag_Name;
  DROP TABLE tag_counts;
END


A couple of examples based on some limited sample data from your prior question. In these examples @timeval is equivalent to the time_id column. Input data:

account     time_id     counters
google      20180510    {"gmail_page_viewed": 2, "search_page_viewed": 51}
google      20180511    {"gmail_page_viewed": 3, "search_page_viewed": 102}
apple       20180511    {"apple_page_viewed": 5, "search_page_viewed": 16}


CALL count_tags('@timeval'):

Time        Tag_Name                Tag_count_value
20180510    "gmail_page_viewed"     2
20180510    "search_page_viewed"    51
20180511    "apple_page_viewed"     5
20180511    "gmail_page_viewed"     3
20180511    "search_page_viewed"    118


CALL count_tags('SUBSTRING(@timeval, 1, 6)'):

Time    Tag_Name                Tag_count_value
201805  "apple_page_viewed"     5
201805  "gmail_page_viewed"     5
201805  "search_page_viewed"    169


Note that you can also use json_sum_merge to simplify your INSERT query e.g.

INSERT INTO `TAG_COUNTER`
  (`account`, `time_id`, `counters`)
VALUES
  ('apple', '20180511', '{"apple_page_viewed": 9, "itunes_page_viewed": 4}')
ON DUPLICATE KEY UPDATE `counters` = json_sum_merge(VALUES(counters), counters)


Result:

account     time_id     counters
apple       20180511    {"apple_page_viewed": 14, "itunes_page_viewed": 4, "search_page_viewed": 16}


In terms of the specific questions in your answer:


No. This answer shows it can be done with your existing data format.
Not applicable.
Not applicable.
Yes, you can stick with the existing {"key" : "value"} format
Since we have to go through every entry of tag_counter to get the list of tags, an index is not beneficial for that section. For the temporary table I have included indexes on the Time and Tag_Name columns which should benefit speed as they are used directly in the GROUP BY clause.


If you were to maintain a list of keys (e.g. in a separate table, maintained by a trigger on insert/update/delete to tag_counter) this code could be made a lot simpler and more efficient. But that is for another question.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复