MySql: Count amount of times the words occur in a column

后端 未结 5 1307
心在旅途
心在旅途 2021-01-06 19:13

For instance, if I have data in a column like this

data
I love book
I love apple
I love book
I hate apple
I hate apple

How can I get result

相关标签:
5条回答
  • 2021-01-06 19:26

    create a user defined function like this and use it in your query

    DELIMITER $$
    
    CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
        RETURNS INT
        BEGIN
        DECLARE cnt INT DEFAULT 0;
        DECLARE result INT DEFAULT 1;
    
        WHILE (result > 0) DO
        SET result = INSTR(myStr, myword);
        IF(result > 0) THEN 
            SET cnt = cnt + 1;
            SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
        END IF;
        END WHILE;
        RETURN cnt;    
    
        END$$
    
    DELIMITER ;
    

    Hope it helps Refer This

    0 讨论(0)
  • 2021-01-06 19:31

    If you want to perform such kind of text analysis, I would recommend using something like lucene, to get the termcount for each term in the document.

    0 讨论(0)
  • 2021-01-06 19:33

    Split-string procedure is not my job. You can find it here

    http://forge.mysql.com/tools/tool.php?id=4

    I wrote you the rest of code.

    drop table if exists mytable;
    create table mytable (
    id int not null auto_increment primary key,
    mytext varchar(1000)
    ) engine = myisam;
    
    insert into mytable (mytext)
    values ('I love book,but book sucks!What do you,think   about it? me too'),('I love apple! it rulez.,No, it sucks a lot!!!'),('I love book'),('I hate apple!!! Me too.,!'),('I hate apple');
    
    drop table if exists mywords;
    create table mywords (
    id int not null auto_increment primary key,
    word varchar(50)
    ) engine = myisam;
    
    
    delimiter //
    drop procedure if exists split_string //
    create procedure split_string (
        in input text
        , in `delimiter` varchar(10) 
    ) 
    sql security invoker
    begin
        declare cur_position int default 1 ;
        declare remainder text;
        declare cur_string varchar(1000);
        declare delimiter_length tinyint unsigned;
    
        drop temporary table if exists SplitValues;
        create temporary table SplitValues (
            value varchar(1000) not null 
        ) engine=myisam;
    
        set remainder = input;
        set delimiter_length = char_length(delimiter);
    
        while char_length(remainder) > 0 and cur_position > 0 do
            set cur_position = instr(remainder, `delimiter`);
            if cur_position = 0 then
                set cur_string = remainder;
            else
                set cur_string = left(remainder, cur_position - 1);
            end if;
            if trim(cur_string) != '' then
                insert into SplitValues values (cur_string);
            end if;
            set remainder = substring(remainder, cur_position + delimiter_length);
        end while;
    
    end //
    delimiter ;
    
    
    delimiter // 
    drop procedure if exists single_words//
    create procedure single_words()
    begin
    declare finish int default 0;
    declare str varchar(200);
    declare cur_table cursor for  select replace(replace(replace(replace(mytext,'!',' '),',',' '),'.',' '),'?',' ') from mytable;
    declare continue handler for not found set finish = 1;
    truncate table mywords;
    open cur_table;
    my_loop:loop
    fetch cur_table into str;
    if finish = 1 then
    leave my_loop;
    end if;
    call split_string(str,' ');
    insert into mywords (word) select * from splitvalues;
    end loop;
    close cur_table;
    end;//
    delimiter ;
    
    call single_words();
    
    select word,count(*) as word_count 
    from mywords
    group by word;
    
    +-------+------------+
    | word  | word_count |
    +-------+------------+
    | a     |          1 |
    | about |          1 |
    | apple |          3 |
    | book  |          3 |
    | but   |          1 |
    | do    |          1 |
    | hate  |          2 |
    | I     |          5 |
    | it    |          3 |
    | lot   |          1 |
    | love  |          3 |
    | me    |          2 |
    | No    |          1 |
    | rulez |          1 |
    | sucks |          2 |
    | think |          1 |
    | too   |          2 |
    | What  |          1 |
    | you   |          1 |
    +-------+------------+
    19 rows in set (0.00 sec)
    

    The code must be improved in order to consider any punctuation but this is the general idea.

    0 讨论(0)
  • 2021-01-06 19:38

    This query is going to take a long time to run if your table is of any decent size. It may be better to keep track of the counts in a separate table and update that table as values are inserted or, if real time results are not necessary, to only run this query every so often to update the counts table and pull your data from it. That way, you're not spending minutes to get data from this complex query.

    Here's what I've for you so far. It's a good start. The only thing you need to do is modify it to iterate through the words in each row. You could use a cursor or a subquery.

    Create test table:

    create table tbl(str varchar(100) );
    insert into tbl values('data');
    insert into tbl values('I love book');
    insert into tbl values('I love apple');
    insert into tbl values('I love book');
    insert into tbl values('I hate apple');
    insert into tbl values('I hate apple');
    

    Pull data from test table:

    SELECT DISTINCT str AS Word, COUNT(str) AS Frequency FROM tbl GROUP BY str;
    
    0 讨论(0)
  • 2021-01-06 19:48

    Here is a solution only using a query:

    SELECT SUM(total_count) as total, value
    FROM (
    
    SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
    FROM (
    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
      FROM table_name t CROSS JOIN 
    (
       SELECT a.N + b.N * 10 + 1 n
         FROM 
        (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
       ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
        ORDER BY n
    ) n
     WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
     ORDER BY value
    
    ) AS x
    GROUP BY x.value
    
    ) AS y
    GROUP BY value
    

    Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

    First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

    0 讨论(0)
提交回复
热议问题