How to count words in MySQL / regular expression replacer?

守給你的承諾、 提交于 2019-11-26 02:38:42

问题


How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?

I need that because, as many people, I would like to count the number of words in a field. However, I\'m not satisfied with the following answer (given several times on that site):

SELECT LENGTH(name) - LENGTH(REPLACE(name, \' \', \'\') +1 FROM table

Because it doesn\'t give good results when there are more that one space between two words.

By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !


回答1:


There's REGEXP_REPLACE available as MySQL user-defined functions.

Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.




回答2:


UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)

Almost a duplicate of this question but this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.

Demo

Rextester online demo

For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).

SQL (excluding function code for brevity):

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;



回答3:


The answer is no you cannot have the same behaviour in MySQL.

But i recommend you checkout this earlier question on the subject which links to a UDF that supposedly enables some of this functionality.




回答4:


MySQL 8.0 now provides a decent REGEXP_REPLACE function, which makes this much simpler:

SQL

SELECT -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
           CHAR_LENGTH(REGEXP_REPLACE(
               txt,
               '[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
               '$1')) -- Discard the first whitespace character and retain the rest
           + 1 -- The word count is 1 more than the number of gaps between words
           - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
           - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
           AS `Word count`
FROM tbl;

Demo

DB-Fiddle online demo



来源:https://stackoverflow.com/questions/1755408/how-to-count-words-in-mysql-regular-expression-replacer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!