How to find most popular word occurrences in MySQL?

后端 未结 6 2297
猫巷女王i
猫巷女王i 2021-02-13 11:38

I have a table called results with 5 columns.

I\'d like to use the title column to find rows that are say: WHERE title like \'%for sale%\

相关标签:
6条回答
  • 2021-02-13 11:46

    Update

    Idea taken from https://stackoverflow.com/a/17942691/98491

    This query works on my machine (MySQL 5.7), however Sqlfiddle reports an error. The basic idea is that you should either create a table with numbers from 1 to maximum word occurence (like 4) in your field or as I did, use a UNION 1 .. 4 for simplicity.

    CREATE TABLE products (
      `id` int,
      `name` varchar(45)
    );
    
    INSERT INTO products
        (`id`, `name`)
    VALUES
        (1, 'for sale'),
        (2, 'for me'),
        (3, 'for you'),
        (4, 'you and me')
    ;
    
    SELECT name, COUNT(*) as count FROM
    (
    SELECT
      product.id,
      SUBSTRING_INDEX(SUBSTRING_INDEX(product.name, ' ', numbers.n), ' ', -1) name
    FROM
      (
        SELECT 1 AS n
        UNION SELECT 2
        UNION SELECT 3
        UNION SELECT 4
      ) AS numbers
      INNER JOIN products product
      ON CHAR_LENGTH(product.name)
         -CHAR_LENGTH(REPLACE(product.name, ' ', ''))>=numbers.n-1
    ORDER BY
      id, n
    )
    AS result
    GROUP BY name
    ORDER BY count DESC
    

    Result will be

    for | 3
    you | 2
    me  | 2
    and | 1
    sale| 1
    
    0 讨论(0)
  • 2021-02-13 11:51

    You can use ExtractValue in some interesting way. See SQL fiddle here: http://sqlfiddle.com/#!9/0b0a0/45

    We need only one table:

    CREATE TABLE text (`title` varchar(29));
    
    INSERT INTO text (`title`)
    VALUES
        ('cheap cars for sale'),
        ('house for sale'),
        ('cats and dogs for sale'),
        ('iphones and androids for sale'),
        ('cheap phones for sale'),
        ('house furniture for sale')
    ;
    

    Now we construct series of selects which extract whole words from text converted to XML. Each select extracts N-th word from the text.

    select words.word, count(*) as `count` from
    (select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[1]') as word from `text`
    union all
    select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[2]') from `text`
    union all
    select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[3]') from `text`
    union all
    select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[4]') from `text`
    union all
    select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[5]') from `text`) as words
    where length(words.word) > 0
    group by words.word
    order by `count` desc, words.word asc
    
    0 讨论(0)
  • 2021-02-13 11:56

    This would give you single words (Just if I understand what your single word means.):

    select concat(val,' ',cnt) as result from(
        select (substring_index(substring_index(t.title, ' ', n.n), ' ', -1)) val,count(*) as cnt
            from result t cross join(
             select a.n + b.n * 10 + 1 n
             from 
                    (select 0 as n union all select 1 union all select 2 union all select 3 
                            union all select 4 union all select 5 union all select 6 
                            union all select 7 union all select 8 union all select 9) a,
                    (select 0 as n union all select 1 union all select 2 union all select 3 
                            union all select 4 union all select 5 union all select 6 
                            union all select 7 union all select 8 union all select 9) b
                    order by n 
            ) n
        where n.n <= 1 + (length(t.title) - length(replace(t.title, ' ', '')))
        group by val
        order by cnt desc
    ) as x
    

    Result should be looks like this :

    Result
    --------
    for 6
    sale 6
    house 2
    and 2
    cheap 2
    phones 1
    iphones 1
    dogs 1
    furniture 1
    cars 1
    androids 1
    cats 1
    

    But if the single word you need like this :

    result
    -----------
    for 6 sale 6 house 2 and 2 cheap 2 phones 1 iphones 1 dogs 1 furniture 1 cars 1 androids 1 cats 1
    

    Just modify the query above to:

    select group_concat(concat(val,' ',cnt) separator ' ') as result from( ...
    
    0 讨论(0)
  • 2021-02-13 12:00

    SQL is not well suited for this task, While possible there are limitations (the number of words for example)

    a quick PHP script to do the same task may be easier to use long term (and likely quicker too)

    <?php
    $rows = [
        "cheap cars for sale",
        "house for sale",
        "cats and dogs for sale",
        "iphones and androids for sale",
        "cheap phones for sale",
        "house furniture for sale",
    ];
    
    //rows here should be replaced by the SQL result
    $wordTotals = [];
    foreach ($rows as $row) {
       $words = explode(" ", $row);
        foreach ($words as $word) {
            if (isset($wordTotals[$word])) {
                $wordTotals[$word]++; 
                continue;
            }
    
            $wordTotals[$word] = 1;
        }
    }
    
    arsort($wordTotals);
    
    foreach($wordTotals as $word => $count) {
        echo $word . " " . $count . PHP_EOL;
    }
    

    Output

    for 6
    sale 6
    and 2
    cheap 2
    house 2
    phones 1
    androids 1
    furniture 1
    cats 1
    cars 1
    dogs 1
    iphones 1
    
    0 讨论(0)
  • 2021-02-13 12:06

    You can extract words with some string manipulation. Assuming you have a numbers table and that words are separated by single spaces:

    select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
           count(*)
    from results r join
         numbers n
         on n.n <= length(title) - length(replace(title, ' ', '')) + 1
    group by word;
    

    If you don't have a numbers table, you can construct one manually using a subquery:

    from results r join
         (select 1 as n union all select 2 union all select 3 union all . . .
         ) n
         . . .
    

    The SQL Fiddle (courtesy of @GrzegorzAdamKowalski) is here.

    0 讨论(0)
  • 2021-02-13 12:09

    Here is working SQL Fiddle: http://sqlfiddle.com/#!9/0b0a0/32

    Let's start with two tables - one for texts and one for numbers:

    CREATE TABLE text (`title` varchar(29));
    
    INSERT INTO text
        (`title`)
    VALUES
        ('cheap cars for sale'),
        ('house for sale'),
        ('cats and dogs for sale'),
        ('iphones and androids for sale'),
        ('cheap phones for sale'),
        ('house furniture for sale')
    ;
    
    CREATE TABLE iterator (`index` int);
    
    INSERT INTO iterator
        (`index`)
    VALUES
        (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),
        (16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
    ;
    

    The second table, iterator must contains numbers from 1 to N where N higher or equal to the lenght of the longest string in text.

    Then, run this query:

    select
      words.word, count(*) as `count`
    from 
    (select
      substring(concat(' ', t.title, ' '), i.index+1, j.index-i.index) as word
    from
      text as t, iterator as i, iterator as j
    where
        substring(concat(' ', t.title), i.index, 1) = ' '
    and substring(concat(t.title, ' '), j.index, 1) = ' '
    and i.index < j.index
    ) AS words
    where
        length(words.word) > 0
    and words.word not like '% %'
    group by words.word
    order by `count` desc, words.word asc
    

    There are two selects. Outer one simply groups and counts single words (words of length greater than 0 and without any spaces). Inner one extracts all strings starting from any space character and ending with any other space character, so strings aren't words (despite naming this subquery words) because they can contain other spaces than starting and ending one.

    Results:

    word    count
    for     6
    sale    6
    and     2
    cheap   2
    house   2
    androids    1
    cars    1
    cats    1
    dogs    1
    furniture   1
    iphones     1
    phones  1
    
    0 讨论(0)
提交回复
热议问题