MySql: Count amount of times the words occur in a column

后端未结

关注

 5  1307

For instance, if I have data in a column like this

data
I love book
I love apple
I love book
I hate apple
I hate apple

How can I get result

相关标签:

5条回答

后悔当初

2021-01-06 19:26

create a user defined function like this and use it in your query

DELIMITER $$

CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
    RETURNS INT
    BEGIN
    DECLARE cnt INT DEFAULT 0;
    DECLARE result INT DEFAULT 1;

    WHILE (result > 0) DO
    SET result = INSTR(myStr, myword);
    IF(result > 0) THEN 
        SET cnt = cnt + 1;
        SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
    END IF;
    END WHILE;
    RETURN cnt;    

    END$$

DELIMITER ;

Hope it helps Refer This

0 讨论(0)

孤城傲影

2021-01-06 19:31

If you want to perform such kind of text analysis, I would recommend using something like lucene, to get the termcount for each term in the document.

0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2021-01-06 19:33

Split-string procedure is not my job. You can find it here

http://forge.mysql.com/tools/tool.php?id=4

I wrote you the rest of code.

drop table if exists mytable;
create table mytable (
id int not null auto_increment primary key,
mytext varchar(1000)
) engine = myisam;

insert into mytable (mytext)
values ('I love book,but book sucks!What do you,think   about it? me too'),('I love apple! it rulez.,No, it sucks a lot!!!'),('I love book'),('I hate apple!!! Me too.,!'),('I hate apple');

drop table if exists mywords;
create table mywords (
id int not null auto_increment primary key,
word varchar(50)
) engine = myisam;


delimiter //
drop procedure if exists split_string //
create procedure split_string (
    in input text
    , in `delimiter` varchar(10) 
) 
sql security invoker
begin
    declare cur_position int default 1 ;
    declare remainder text;
    declare cur_string varchar(1000);
    declare delimiter_length tinyint unsigned;

    drop temporary table if exists SplitValues;
    create temporary table SplitValues (
        value varchar(1000) not null 
    ) engine=myisam;

    set remainder = input;
    set delimiter_length = char_length(delimiter);

    while char_length(remainder) > 0 and cur_position > 0 do
        set cur_position = instr(remainder, `delimiter`);
        if cur_position = 0 then
            set cur_string = remainder;
        else
            set cur_string = left(remainder, cur_position - 1);
        end if;
        if trim(cur_string) != '' then
            insert into SplitValues values (cur_string);
        end if;
        set remainder = substring(remainder, cur_position + delimiter_length);
    end while;

end //
delimiter ;


delimiter // 
drop procedure if exists single_words//
create procedure single_words()
begin
declare finish int default 0;
declare str varchar(200);
declare cur_table cursor for  select replace(replace(replace(replace(mytext,'!',' '),',',' '),'.',' '),'?',' ') from mytable;
declare continue handler for not found set finish = 1;
truncate table mywords;
open cur_table;
my_loop:loop
fetch cur_table into str;
if finish = 1 then
leave my_loop;
end if;
call split_string(str,' ');
insert into mywords (word) select * from splitvalues;
end loop;
close cur_table;
end;//
delimiter ;

call single_words();

select word,count(*) as word_count 
from mywords
group by word;

+-------+------------+
| word  | word_count |
+-------+------------+
| a     |          1 |
| about |          1 |
| apple |          3 |
| book  |          3 |
| but   |          1 |
| do    |          1 |
| hate  |          2 |
| I     |          5 |
| it    |          3 |
| lot   |          1 |
| love  |          3 |
| me    |          2 |
| No    |          1 |
| rulez |          1 |
| sucks |          2 |
| think |          1 |
| too   |          2 |
| What  |          1 |
| you   |          1 |
+-------+------------+
19 rows in set (0.00 sec)

The code must be improved in order to consider any punctuation but this is the general idea.

0 讨论(0)

忘掉有多难

2021-01-06 19:38
This query is going to take a long time to run if your table is of any decent size. It may be better to keep track of the counts in a separate table and update that table as values are inserted or, if real time results are not necessary, to only run this query every so often to update the counts table and pull your data from it. That way, you're not spending minutes to get data from this complex query.

Here's what I've for you so far. It's a good start. The only thing you need to do is modify it to iterate through the words in each row. You could use a cursor or a subquery.

Create test table:
```
create table tbl(str varchar(100) );
insert into tbl values('data');
insert into tbl values('I love book');
insert into tbl values('I love apple');
insert into tbl values('I love book');
insert into tbl values('I hate apple');
insert into tbl values('I hate apple');
```
Pull data from test table:
```
SELECT DISTINCT str AS Word, COUNT(str) AS Frequency FROM tbl GROUP BY str;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2021-01-06 19:48

Here is a solution only using a query:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

0 讨论(0)