stemming

NLTK words lemmatizing

强颜欢笑 提交于 2020-01-03 17:23:32
问题 I am trying to do lemmatization on words with NLTK . What I can find now is that I can use the stem package to get some results like transform "cars" to "car" and "women" to "woman", however I cannot do lemmatization on some words with affixes like "acknowledgement". When using WordNetLemmatizer() on "acknowledgement", it returns "acknowledgement" and using .PorterStemmer() , it returns "acknowledg" rather than "acknowledge". Can anyone tell me how to eliminate the affixes of words? Say, when

StandardAnalyzer with stemming

半腔热情 提交于 2019-12-30 07:25:17
问题 Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks 回答1: If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the

StandardAnalyzer with stemming

◇◆丶佛笑我妖孽 提交于 2019-12-30 07:25:07
问题 Is there a way to integrate PorterStemFilter into StandardAnalyzer in Lucene, or do I have to copy/paste StandardAnalyzers source code, and add the filter, since StandardAnalyzer is defined as final class. Is there any smarter way? Also, if I would like not to consider numbers, how can I achieve that? Thanks 回答1: If you want to use this combination for English text analysis, then you should use Lucene's EnglishAnalyzer . Otherwise, you could create a new Analyzer that extends the

Stemming English words with Lucene

自闭症网瘾萝莉.ら 提交于 2019-12-28 03:30:08
问题 I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit". The function looks like: String stemTerm(String term){ ... } I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer

Find different realization of a word in a sentence string - Python

廉价感情. 提交于 2019-12-24 11:23:33
问题 (This question is with regards to string checking in general and not Natural Language Procesisng per se, but if you view it as an NLP problem, imagine it's not a langauge that current analyzers can analye, for simplicity sake, i'll use english strings as e.g.) lets say there are only 6 possible form that a word can be realized in the initial letter being capitalized its plural form with an "s" its plural form with an "es" capitalized + "es" capitalized + "s" the basic form without plural or

Is it possible to get a natural word after it has been stemmed?

房东的猫 提交于 2019-12-24 04:41:29
问题 I have a word play which after stemming has become plai . Now I want to get play again. Is it possible? I have used Porter's Stemmer. 回答1: Stemmer is able to process artificial non-existing words. Would you like them to be returned as elements of a set of all possible words? How do you know that the word doesn't exist and shouldn't be returned? As an option: find a dictionary of all words and their forms. Find a stem for every of them. Save this projection as a map: ( stem, list of all word

Porter Stemming of fried

送分小仙女□ 提交于 2019-12-24 02:23:03
问题 Why does the porter stemming algorithm online at http://text-processing.com/demo/stem/ stem fried to fri and not fry ? I can't recall any words ending with ied past tense in English that have a nominative form ending with i . Is this a bug? 回答1: A stem as returned by Porter Stemmer is not necessarily the base form of a verb, or a valid word at all. If you're looking for that, you need to look for a lemmatizer instead. 回答2: Firstly, a stemmer is not a lemmatizer, see also Stemmers vs

SQL Server vs MySQL: CONTAINS(*,'FORMSOF(THESAURUS,word)')

此生再无相见时 提交于 2019-12-22 04:58:13
问题 I am shocked. I spent past 3-4 days figuring out how I could implement stemming (and synonyms searches) in mysql when I see in SQL Server the query is incredibly easly: Select * from tab where CONTAINS(*,'FORMSOF(THESAURUS,word)') Is possibile on MySql there isn't anything like that? 回答1: No, MySQL does not support matching against a user-provided thesaurus. You can use an external FULLTEXT engine like Sphinx which supports morphology rules, has several stemmers and thesauri built in and

I want a Java Arabic stemmer

感情迁移 提交于 2019-12-21 01:46:05
问题 I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ? 回答1: Here is new Arabic stemmer: Assem's Arabic light stemmer coded using Snowball framework and generated to many languages including Java. You can use it by downloading libstemmer for Java here. 回答2: You can find Kohja stemmer here: http://zeus.cs.pacificu.edu/shereen/research.htm Direct

Looking for a database or text file of english words with their different forms

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-19 19:46:57
问题 I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project but it doesn't include API in Java. At this time I am looking for a database or a text file of english words with their different forms. for example: run running ran ... include including included ... ... Thank you for your help or advise. 回答1: