stemming | 易学教程

how index words with their prefix in solr?

阅读更多关于 how index words with their prefix in solr?

问题 I use solr 3.3 to index my files, I want solr index words with their suffixes for example I want to index colorful like color and when i search color solr show any document that has colorful. 回答1: You would need to apply analysis on the field. Stemming - Its a kind of dictionary. This would reduce the word indexed and searched to its roots. e.g. color, colors, colored would match your searches, if any word is searched. There was would be cases where the above stemming does not work. You can

how index words with their prefix in solr?

阅读更多关于 how index words with their prefix in solr?

I use solr 3.3 to index my files, I want solr index words with their suffixes for example I want to index colorful like color and when i search color solr show any document that has colorful. You would need to apply analysis on the field. Stemming - Its a kind of dictionary. This would reduce the word indexed and searched to its roots. e.g. color, colors, colored would match your searches, if any word is searched. There was would be cases where the above stemming does not work. You can use SynonymFilter , This allows you to specify words which you term as synonym and would match the search

Stemming does not work properly for MongoDB text index

阅读更多关于 Stemming does not work properly for MongoDB text index

问题 I am trying to use full text search feature of MongoDB and observing some unexpected behavior. The problem is related to "stemming" aspect of the text indexing feature. The way full text search is described in many articles online, if you have a string "big hunting dogs" in a document's field that is part of the text index, you should be able to search on "hunt" or "hunting" as well as on "dog" or "dogs". MongoDB should normalize or stem the text when indexing and also when searching. So in

Stemming does not work properly for MongoDB text index

阅读更多关于 Stemming does not work properly for MongoDB text index

I am trying to use full text search feature of MongoDB and observing some unexpected behavior. The problem is related to "stemming" aspect of the text indexing feature. The way full text search is described in many articles online, if you have a string "big hunting dogs" in a document's field that is part of the text index, you should be able to search on "hunt" or "hunting" as well as on "dog" or "dogs". MongoDB should normalize or stem the text when indexing and also when searching. So in my example, I would expect it to save words "dog" and "hunt" in the index and search for a stemmed

Looking for a database or text file of english words with their different forms

阅读更多关于 Looking for a database or text file of english words with their different forms

I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project but it doesn't include API in Java. At this time I am looking for a database or a text file of english words with their different forms. for example: run running ran ... include including included ... ... Thank you for your help or advise. You could download LanguageTool (Disclaimer: I'm the maintainer), which comes with a binary file english.dict

multiple results of one variable when applying tm method “stemCompletion”

阅读更多关于 multiple results of one variable when applying tm method “stemCompletion”

I have a corpus containing journal data of 15 observations of 3 variables (ID, title, abstract). Using R Studio I read in the data from a .csv file (one line per observation). When performing some text mining operations I got some trouble when using the method stemCompletion. After applying stemCompletion I observed that the results are provided for each stemmed line of the .csv three times. All the other tm methods (e.g. stemDocument) produce only a single result. I'm wondering why this happens and how I could fix the problem I used the code below: data.corpus <- Corpus(DataframeSource(data))

multiple results of one variable when applying tm method “stemCompletion”

阅读更多关于 multiple results of one variable when applying tm method “stemCompletion”

问题 I have a corpus containing journal data of 15 observations of 3 variables (ID, title, abstract). Using R Studio I read in the data from a .csv file (one line per observation). When performing some text mining operations I got some trouble when using the method stemCompletion. After applying stemCompletion I observed that the results are provided for each stemmed line of the .csv three times. All the other tm methods (e.g. stemDocument) produce only a single result. I'm wondering why this

StandardAnalyzer with stemming

阅读更多关于 StandardAnalyzer with stemming

Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks ameertawfik If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the AnalyzerWraper as shown below. import java.io.IOException; import java.io.StringReader; import java.util

Converting plural to singular in a text file with Python

阅读更多关于 Converting plural to singular in a text file with Python

I have txt files that look like this: word, 23 Words, 2 test, 1 tests, 4 And I want them to look like this: word, 23 word, 2 test, 1 test, 4 I want to be able to take a txt file in Python and convert plural words to singular. Here's my code: import nltk f = raw_input("Please enter a filename: ") def openfile(f): with open(f,'r') as a: a = a.read() a = a.lower() return a def stem(a): p = nltk.PorterStemmer() [p.stem(word) for word in a] return a def returnfile(f, a): with open(f,'w') as d: d = d.write(a) #d.close() print openfile(f) print stem(openfile(f)) print returnfile(f, stem(openfile(f)))

stemming library in java [closed]

阅读更多关于 stemming library in java [closed]

Is there any library for stemming in java!? There is an implementation of Porter's stemmer available on his website . The code is not very Java-ish, but it does what it's supposed to, plus it's only a single class. You might want to look at Apache Lucene . It is generally written to do other things, but it does some stemming as part of its indexing process. Updated Answer: Porter recommends a later version of what is available on his website. That is Snowball: http://snowball.tartarus.org/ It is essentially a code generator that can generate a Java or C stemmer based on a stemmer specification