MySql PHP select count of distinct values from comma separated data (tags)

后端 未结 5 1625
感情败类
感情败类 2020-12-01 18:38

How can I select the count of distinct values from data that is stored as comma separated values in MySql? I\'ll be using PHP to output the data from MySql in the end.

相关标签:
5条回答
  • 2020-12-01 19:18

    This should work:

    SELECT tag, count(0) count FROM (
        SELECT tOut.*, REPLACE(SUBSTRING(SUBSTRING_INDEX(tags, ',', ocur_rank), LENGTH(SUBSTRING_INDEX(tags, ',', ocur_rank - 1)) + 1), ',', '') tag
          FROM (
            SELECT @num_type := if(@id_check = tY.id, @num_type + 1, 1) AS ocur_rank, @id_check := tY.id as id_check, tY.*
              FROM (
                SELECT LENGTH(tags) - LENGTH(REPLACE(tags, ',', '')) AS num_ocur, id, tags FROM tablename
              ) tX
              INNER JOIN (SELECT LENGTH(tags) - LENGTH(REPLACE(tags, ',', '')) AS num_ocur, id, tags FROM tablename) tY
              INNER JOIN (SELECT @num_type := 0, @id_check := 'some_id') tZ
           ) tOut
         WHERE ocur_rank <= num_ocur + 1
    ) tempTable GROUP BY tag ORDER BY count DESC;
    

    Replace "tablename" with the name of your table.

    This answer was derived from a solution by Jesse Perring posted on this page:

    http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#c12113

    0 讨论(0)
  • 2020-12-01 19:20

    Solution

    I don't really know how to transform an horizontal list of comma-separated values to a list of rows without creating a table containing numbers, as many numbers as you may have comma-separated values. If you can create this table, here is my answer:

    SELECT 
      SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) AS one_tag,
      COUNT(*) AS cnt
    FROM (
      SELECT
        GROUP_CONCAT(tags separator ',') AS all_tags,
        LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
      FROM test
    ) t
    JOIN numbers n
    ON n.num <= t.count_tags
    GROUP BY one_tag
    ORDER BY cnt DESC;
    

    Returns:

    +---------------------+-----+
    | one_tag             | cnt |
    +---------------------+-----+
    | chicken             |   5 |
    | pork                |   4 |
    | spaghetti           |   3 |
    | fried-rice          |   2 |
    | manchurain          |   2 |
    | pho                 |   1 |
    | chicken-calzone     |   1 |
    | fettuccine          |   1 |
    | chorizo             |   1 |
    | meat-balls          |   1 |
    | miso-soup           |   1 |
    | chanko-nabe         |   1 |
    | chicken-manchurian  |   1 |
    | pork-manchurian     |   1 |
    | sweet-and-sour-pork |   1 |
    | peking-duck         |   1 |
    | duck                |   1 |
    +---------------------+-----+
    17 rows in set (0.01 sec)
    

    See sqlfiddle


    Explaination

    Scenario

    1. We concatenate all tags using a comma to create only one list of tags instead of one per row
    2. We count how many tags we have in our list
    3. We find how we can get one value in this list
    4. We find how we can get all values as distinct rows
    5. We count tags grouped by their value

    Context

    Let's build your schema:

    CREATE TABLE test (
        id INT PRIMARY KEY,
        tags VARCHAR(255)
    );
    
    INSERT INTO test VALUES
        ("1",         "pho,pork"),
        ("2",         "fried-rice,chicken"),
        ("3",         "fried-rice,pork"),
        ("4",         "chicken-calzone,chicken"),
        ("5",         "fettuccine,chicken"),
        ("6",         "spaghetti,chicken"),
        ("7",         "spaghetti,chorizo"),
        ("8",         "spaghetti,meat-balls"),
        ("9",         "miso-soup"),
        ("10",        "chanko-nabe"),
        ("11",        "chicken-manchurian,chicken,manchurain"),
        ("12",        "pork-manchurian,pork,manchurain"),
        ("13",        "sweet-and-sour-pork,pork"),
        ("14",        "peking-duck,duck");
    

    Concatenate all list of tags

    We will work with all tags in a single line, so we use GROUP_CONCAT to do the job:

    SELECT GROUP_CONCAT(tags SEPARATOR ',') FROM test;
    

    Returns all tags separated by a comma:

    pho,pork,fried-rice,chicken,fried-rice,pork,chicken-calzone,chicken,fettuccine,chicken,spaghetti,chicken,spaghetti,chorizo,spaghetti,meat-balls,miso-soup,chanko-nabe,chicken-manchurian,chicken,manchurain,pork-manchurian,pork,manchurain,sweet-and-sour-pork,pork,peking-duck,duck

    Count all tags

    To count all tags, we get the length of the full list of tags, and we remove the length of the full list of tags after replacing the , by nothing. We add 1, as the separator is between two values.

    SELECT LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
    FROM test;
    

    Returns:

    +------------+
    | count_tags |
    +------------+
    |         28 |
    +------------+
    1 row in set (0.00 sec)
    

    Get the Nth tag in the tag list

    We use the SUBSTRING_INDEX function to get

    -- returns the string until the 2nd delimiter\'s occurrence from left to right: a,b
    SELECT SUBSTRING_INDEX('a,b,c', ',', 2);
    
    -- return the string until the 1st delimiter, from right to left: c
    SELECT SUBSTRING_INDEX('a,b,c', ',', -1);
    
    -- we need both to get: b (with 2 being the tag number)
    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('a,b,c', ',', 2), ',', -1);
    

    With such logic, to get the 3rd tag in our list, we use:

    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1)
    FROM test;
    

    Returns:

    +-------------------------------------------------------------------------------------+
    | SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1) |
    +-------------------------------------------------------------------------------------+
    | fried-rice                                                                          |
    +-------------------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    

    Get all values as distinct rows

    My idea is a little tricky:

    1. I know we can create rows by joining tables
    2. I need to get the Nth tag in the list using the request above

    So we will create a table containing all numbers from 1 to the maximum number of tags you may have in your list. If you can have 1M values, create 1M entries from 1 to 1,000,000. For 100 tags, this will be:

    CREATE TABLE numbers (
      num INT PRIMARY KEY
    );
    
    INSERT INTO numbers VALUES
        ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 ), 
        ( 11 ), ( 12 ), ( 13 ), ( 14 ), ( 15 ), ( 16 ), ( 17 ), ( 18 ), ( 19 ), ( 20 ), 
        ( 21 ), ( 22 ), ( 23 ), ( 24 ), ( 25 ), ( 26 ), ( 27 ), ( 28 ), ( 29 ), ( 30 ), 
        ( 31 ), ( 32 ), ( 33 ), ( 34 ), ( 35 ), ( 36 ), ( 37 ), ( 38 ), ( 39 ), ( 40 ), 
        ( 41 ), ( 42 ), ( 43 ), ( 44 ), ( 45 ), ( 46 ), ( 47 ), ( 48 ), ( 49 ), ( 50 ), 
        ( 51 ), ( 52 ), ( 53 ), ( 54 ), ( 55 ), ( 56 ), ( 57 ), ( 58 ), ( 59 ), ( 60 ), 
        ( 61 ), ( 62 ), ( 63 ), ( 64 ), ( 65 ), ( 66 ), ( 67 ), ( 68 ), ( 69 ), ( 70 ), 
        ( 71 ), ( 72 ), ( 73 ), ( 74 ), ( 75 ), ( 76 ), ( 77 ), ( 78 ), ( 79 ), ( 80 ), 
        ( 81 ), ( 82 ), ( 83 ), ( 84 ), ( 85 ), ( 86 ), ( 87 ), ( 88 ), ( 89 ), ( 90 ), 
        ( 91 ), ( 92 ), ( 93 ), ( 94 ), ( 95 ), ( 96 ), ( 97 ), ( 98 ), ( 99 ), ( 100 );
    

    Now, we get the numth (num being a row in number) using the following query:

    SELECT n.num, SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) as one_tag
    FROM (
      SELECT
        GROUP_CONCAT(tags SEPARATOR ',') AS all_tags,
        LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
      FROM test
    ) t
    JOIN numbers n
    ON n.num <= t.count_tags
    

    Returns:

    +-----+---------------------+
    | num | one_tag             |
    +-----+---------------------+
    |   1 | pho                 |
    |   2 | pork                |
    |   3 | fried-rice          |
    |   4 | chicken             |
    |   5 | fried-rice          |
    |   6 | pork                |
    |   7 | chicken-calzone     |
    |   8 | chicken             |
    |   9 | fettuccine          |
    |  10 | chicken             |
    |  11 | spaghetti           |
    |  12 | chicken             |
    |  13 | spaghetti           |
    |  14 | chorizo             |
    |  15 | spaghetti           |
    |  16 | meat-balls          |
    |  17 | miso-soup           |
    |  18 | chanko-nabe         |
    |  19 | chicken-manchurian  |
    |  20 | chicken             |
    |  21 | manchurain          |
    |  22 | pork-manchurian     |
    |  23 | pork                |
    |  24 | manchurain          |
    |  25 | sweet-and-sour-pork |
    |  26 | pork                |
    |  27 | peking-duck         |
    |  28 | duck                |
    +-----+---------------------+
    28 rows in set (0.01 sec)
    

    Count tags occurrences

    As soon as we now have classic rows, we can easily count occurrences of each tags.

    See the top of this answer to see the request.

    0 讨论(0)
  • 2020-12-01 19:37

    First, you should store this using a junction table, with one row per post and tag. Sometimes, however, we cannot control the structure of data we are working with.

    You can do what you want assuming you have a list of valid tags:

    select vt.tag, count(t.postid) as cnt
    from validtags vt left join
         table t
         on find_in_set(vt.tag, t.tags) > 0
    group by vt.tag
    order by cnt desc;
    
    0 讨论(0)
  • 2020-12-01 19:41

    Alain Tiembo has a nice answer which explains a lot of the mechanics underneath. However, his solution requires a temporary table (numbers) to solve the problem. As a follow up answer, I am combining all his steps into one single query (using tablename for your original table):

        SELECT t.tags, count(*) AS occurence FROM
        (SELECT
          tablename.id,
          SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.tags, ',', numbers.n), ',', -1) tags
        FROM
          (SELECT 1 n UNION ALL SELECT 2
           UNION ALL SELECT 3 UNION ALL SELECT 4) numbers INNER JOIN tablename
          ON CHAR_LENGTH(tablename.tags)
             -CHAR_LENGTH(REPLACE(tablename.tags, ',', ''))>=numbers.n-1
        ORDER BY
          id, n) t
        GROUP BY t.tags
        ORDER BY occurence DESC, t.tags ASC
    

    See the SQLFiddle for demonstration purposes.

    0 讨论(0)
  • 2020-12-01 19:42

    The recommended way of doing this is to not store multiple values in a single column, but create an intersection table.

    So, your tables would have these columns:
    1. tags: tag_id, name
    2. posts: post_id, category_code
    3. int_tags_to_posts: post_id, tag_id

    To get the counts:
    select t.name, count(*) from tags t, posts p, int_tags_to_posts i where i.post_id = p.post_id and i.tag_id = t.tag_id group by i.tag_id order by count(*) desc;

    0 讨论(0)
提交回复
热议问题