I\'m thinking of putting a stop words in my similarity program and then a stemmer (going for porters 1 or 2 depends on what easiest to implement)
I was wondering that si
You don't have to deal with the whole text. Just split it, apply your stopword filter and stemming algorithm, then build the string again using a StringBuilder
:
StrinBuilder builder = new StringBuilder(text.length());
String[] words = text.split("\\s+");
for (String word : words) {
if (stopwordFilter.check(word)) { // Apply stopword filter.
word = stemmer.stem(word); // Apply stemming algorithm.
builder.append(word);
}
}
text = builder.toString();