Get most frequent top 10 items found in a HTML column using Count

后端 未结 3 1242
南方客
南方客 2021-01-14 09:45

I have a bit of a messy query to try figure out.

I have a column called "meta_value" and in that I have some HTML data such as:



        
相关标签:
3条回答
  • 2021-01-14 10:14

    Depending on how frequently you need to perform this and the size of the dataset, I would probably extract this data out in to a new table. I would create a table with pk, card_name (unique), count and then write a command in the application to iterate over the existing data to parse the out the names from the tag bodies or the data-name attribute in the html and create the row or update the count in the row, and then make the change in the application to make sure that column gets updated whenever meta_value changes.

    Doing it this way and just sorting based on the count field will be more performant when for this specific lookup, but it will also make this query still be valid should the structure of your html ever change. It also allows for you at add other properties to these items

    0 讨论(0)
  • 2021-01-14 10:24

    Following is purely MySQL-only solution; you can run this query once (or twice) a day in off-peak hours, to update the count in a cache/summary table. Moreover, number of rows are roughly around 6000 (only), so (depending on your server configuration), it should not cause performance issues.

    Now, since the number of cards in a particular row is variable (can range from 40-60), we can use a Sequence table. You can define a permanent table in your database storing integers ranging from 1 to 100 (you may find this table helpful in many other cases as well):

    CREATE TABLE seq (n tinyint(3) UNSIGNED NOT NULL, PRIMARY KEY(n));
    INSERT INTO seq (n) VALUES (1), (2), ...... , (99), (100);
    

    Now, we will do a JOIN between wph3_postmeta and seq table, based on the count of occurrence of substring 'data-name=""' inside the specific meta_value. We can get the count of occurrence of the substring (which also means, count of cards in a particular row) using:

    (
      CHAR_LENGTH(wp.meta_value) 
      - CHAR_LENGTH(REPLACE(wp.meta_value, 'data-name=""', ''))
    ) / CHAR_LENGTH('data-name=""')
    
    

    Now, we can use the Substring_Index() function to extract the card values out. Using the different numbers in different row, we can basically extract out the first card, second card, and so on...

    Once we have extracted all the words out, in separate rows; we can then use the complete result-set as a Derived Table, and perform the aggregation queries to get the required results:

    Query (View on DB Fiddle)

    SELECT dt.name,
           Count(DISTINCT dt.meta_id) AS unique_metaid_count
    FROM   (SELECT wp.meta_id,
                   Substring_index(Substring_index(wp.meta_value, 'data-name=""',
                                   -seq.n),
                   '"">', 1
                   ) AS name
            FROM   wph3_postmeta AS wp
                   JOIN seq
                     ON ( Char_length(wp.meta_value) - Char_length(
                                                       REPLACE(wp.meta_value,
                                                       'data-name=""'
                                                            ,
                                                            '')) ) /
                             Char_length('data-name=""') >= n
            WHERE  wp.meta_key = 'deck_list') AS dt
    GROUP  BY dt.name
    ORDER  BY unique_metaid_count DESC  
    /* To get top 10 counts only, add LIMIT 10 */
    

    Result

    | name                                          | unique_metaid_count |
    | --------------------------------------------- | ------------------- |
    | Call of the Haunted                           | 2                   |
    | Inferno Reckless Summon                       | 2                   |
    | Mystic Box                                    | 2                   |
    | Mystical Space Typhoon                        | 2                   |
    | Number 39: Utopia                             | 2                   |
    | #created by ygopro2                           | 1                   |
    | 98095162                                      | 1                   |
    | Abyss Dweller                                 | 1                   |
    | Advanced Ritual Art                           | 1                   |
    | Armed Dragon LV3                              | 1                   |
    | Armed Dragon LV5                              | 1                   |
    | Axe of Despair                                | 1                   |
    | B.E.S. Covered Core                           | 1                   |
    .....
    
    | The Dragon Dwelling in the Cave               | 1                   |
    | The Flute of Summoning Dragon                 | 1                   |
    | The Forces of Darkness                        | 1                   |
    | Threatening Roar                              | 1                   |
    | Time Machine                                  | 1                   |
    | Torike                                        | 1                   |
    | Tornado Dragon                                | 1                   |
    | Torrential Tribute                            | 1                   |
    | Tragoedia                                     | 1                   |
    | Trap Hole                                     | 1                   |
    | Treeborn Frog                                 | 1                   |
    | Trishula, Dragon of the Ice Barrier           | 1                   |
    | Twin Twisters                                 | 1                   |
    | Vanity's Ruler                                | 1                   |
    | Wind-Up Snail                                 | 1                   |
    | Wind-Up Soldier                               | 1                   |
    | Wulf, Lightsworn Beast                        | 1                   |
    | Zure, Knight of Dark World                    | 1                   |
    

    Note: If you want Top 10 only (by count), you can simply add LIMIT 10 at the end of the query.

    0 讨论(0)
  • 2021-01-14 10:31

    I guess you need a simple LIMIT and ORDER BY Clause -

    SELECT meta_value, COUNT(DISTINCT meta_value) VALUE_COUNT
    FROM wph3_postmeta
    WHERE meta_key = "deck_list"
    AND meta_value REGEXP '>[[:alnum:]]+(</a>)$'
    GROUP BY meta_value
    ORDER BY VALUE_COUNT DESC
    LIMIT 10;
    
    0 讨论(0)
提交回复
热议问题