Regular Expression for accurate word-count using JavaScript

浪子不回头ぞ 提交于 2019-11-27 07:41:00

This should do what you're after:

value.match(/\S+/g).length;

Rather than splitting the string, you're matching on any sequence of non-whitespace characters.

There's the added bonus of being easily able to extract each word if needed ;)

Try to count anything that is not whitespace and with a word boundary:

value.split(/\b\S+\b/g).length

You could also try to use unicode ranges, but I am not sure if the following one is complete:

value.split(/[\u0080-\uFFFF\w]+/g).length

For me this gave the best results:

value.split(/\b\W+\b/).length

with

var words = value.split(/\b\W+\b/)

you get all words.

Explanation:

  • \b is a word boundary
  • \W is a NON-word character, capital usually means the negation
  • '+' means 1 or more characters or the prefixed character class

I recommend learning regular expressions. It's a great skill to have because they are so powerful. ;-)

The correct regexp would be /s+/ in order to discard non-words:

'Lorem ipsum dolor , sit amet'.split(/\S+/g).length
7
'Lorem ipsum dolor , sit amet'.split(/\s+/g).length
6

you could extend/change you methods like this

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1; if you want to match things like email-addresses as well

and

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;

also try using \s as its the \w for unicode

source:http://www.regular-expressions.info/charclass.html

Try

    value.match(/\w+/g).length;

This will match a string of characters that can be in a word. Whereas something like:

    value.match(/\S+/g).length;

will result in an incorrect count if the user adds commas or other punctuation that is not followed by a space - or adds a comma with a space either side of it.

my simple JavaScript library, called FuncJS has a function called "count()" which does exactly what it's called — count words.

For example, say that you have a string full of words, you can simply place it in between the function brackets, like this:

count("How many words are in this string?");

and then call the function, which will then return the number of words. Also, this function is designed to ignore any amount of whitespace, thus giving an accurate result.

To learn more about this function, please read the documentation at http://docs.funcjs.webege.com/count().html and the download link for FuncJS is also on the page.

Hope this helps anyone wanting to do this! :)

If JavaScript understands punctuation class [[:punct:]] and a lookahead assertion (?=)
then this should get all the words:

/[\s[:punct:]]*(\w(?:\w|[[:punct:]](?=[\w[:punct:]]))*)/

or, if you don't have the (?:) construct ...

/[\s[:punct:]]*(\w(\w|[[:punct:]](?=[\w[:punct:]]))*)/

Using this in Perl would go like this:

# Extracting and count the number of words
#
use strict;
use warnings;

my $text = q(
  I confirm that sufficient information and detail have been
  reported in this technical report, that it's "scientifically" sound,
  and that appropriate conclusion's have been included
);

my $regex = qr/ [\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* ) /x;
my $wordcount = 0;

while ( $text =~ /$regex/g )
{
    print "$1\n";
    $wordcount++;
}

print "\n", '-'x20, "\nFound $wordcount words\n\n";

Output:

I
confirm
that
sufficient
information
and
detail
have
been
reported
in
this
technical
report
that
it's
scientifically
sound
and
that
appropriate
conclusion's
have
been
included

--------------------
Found 25 words
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!