I need to find a word in a HTML source code. Also I need to count occurrence. I am trying to use regular expression. But it says 0 match found.
I am using regular ex
StringUtils.countMatches(SourceCode, "hsw.ads") ought to work, however sticking with the approach you have above (which is valid), I'd recommend a few things: 1. As John Haager mentioned, remove the opening/closing .* will help, becuase you're looking for that exact substring 2. You want to escape the '.' because you're searching for a literal '.' and not a wildcard 3. I would make this Pattern a constant and re-use it rather than re-creating it each time.
That said, I'd still suggest using the approaches above, but I thought I'd just point out your current approach isn't conceptually flawed; just a few implementation details missing.
To find a string in Java you can use String methods indexOf
which tells you the index of the first character of the string you searched for. To find all of them and count them you can do this (there might be a faster way but this should work). I would recommend using StringUtils CountMatches method.
String temp = string; //Copy to save the string
int count = 0;
String a = "hsw.ads";
int i = 0;
while(temp.indexOf(a, i) != -1) {
count++;
i = temp.indexof(a, i) + a.length() + 1;
}
You should try this.
private int getWordCount(String word,String source){
int count = 0;
{
Pattern p = Pattern.compile(word);
Matcher m = p.matcher(source);
while(m.find()) count++;
}
return count;
}
Pass the word (Not pattern) you want to search in a string.
Your code and regular expression is valid. You don't need to include the .* at the beginning and the end of your regex. For example:
String t = "hsw.ads hsw.ads hsw.ads";
int count = 0;
Matcher m = Pattern.compile("hsw\\.ads").matcher(t);
while (m.find()){ count++; }
In this case, count is 3. And another thing, if you're going to use a regex, if you REALLY want to specifically look for a '.' period between hsw and ads, you need to escape it.
You are not matching any "expression", so probably a simple string search would be better. commons-lang has StringUtils.countMatches(source, "yourword")
.
If you don't want to include commons-lang, you can write that manually. Simply use source.indexOf("yourword", x)
multiple times, each time supplying a greater value of x
(which is the offset), until it gets -1