stemming

One word phrase search to avoid stemming in Solr

扶醉桌前 提交于 2019-12-05 02:06:31
问题 I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case? Is there a simple way to achieve this? 回答1: There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here). For example, if I perform this search: q=field_name:determine I see

SnowballStemmer for Russian words list

我怕爱的太早我们不能终老 提交于 2019-12-04 09:02:40
I do know how to perform SnowballStemmer on a single word (in my case, on russian one). Doing the next things: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("russian") stemmer.stem("Василий") 'Васил' How can I do the following if I have a list of words like ['Василий', 'Геннадий', 'Виталий']? My approach using for loop seems to be not working :( l=[stemmer.stem(word) for word in l] Your variable l is not pre-defined, causing the name error. See my last two lines for fix. >>> from nltk.stem.snowball import SnowballStemmer >>> stemmer = SnowballStemmer("russian") >>>

Exact word search in Solr

折月煮酒 提交于 2019-12-03 21:09:24
I have a question which closely relates to this question . In my schema I have a field <field name="text" type="textgen" indexed="true" stored="true" required="true"/> This gives an exact match, ie. stemming disabled eat = eat Is it possible, while configured to textgen to search for other variants of the word eg. eat = eat, eats, eating eat~0 will give similar sounding words such as meat, beat etc. but this is not what I want. I'm starting to think that the only way to achieve this is to add another field with something other then textgen but if there is a simpler way I am very interested to

One word phrase search to avoid stemming in Solr

不羁岁月 提交于 2019-12-03 17:32:48
I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case? Is there a simple way to achieve this? There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here). For example, if I perform this search: q=field_name:determine I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so: q

Base word stemming instead of root word stemming in R

北城余情 提交于 2019-12-03 12:04:06
问题 Is there any way to get base word instead of root word in stemming using NLP in R? Code: > #Loading libraries > library(tm) > library(slam) > > #Vector > Vec=c("happyness happies happys","sky skies") > > #Creating Corpus > Txt=Corpus(VectorSource(Vec)) > > #Stemming > Txt=tm_map(Txt, stemDocument) > > #Checking result > inspect(Txt) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data

nltk stemmer: string index out of range

此生再无相见时 提交于 2019-12-03 11:38:13
问题 I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results

Python stemming (with pandas dataframe)

最后都变了- 提交于 2019-12-03 08:31:46
I came to the following problem while programming Python: I use a Pandas dataframe containing words that have to stemmed (using SnowballStemmer). I want the words to be stemmed to investigate the results for stemmed vs non stemmed text and for this I will be using a classifier. I use the following code for the stemmer: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("dutch") I want to stem all separate words in the list while remaining the order and keeping every key with every value. This is the column from the Pandas dataframe from which I want every separate word

I want a Java Arabic stemmer

。_饼干妹妹 提交于 2019-12-03 06:50:29
I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ? Here is new Arabic stemmer: Assem's Arabic light stemmer coded using Snowball framework and generated to many languages including Java. You can use it by downloading libstemmer for Java here . You can find Kohja stemmer here: http://zeus.cs.pacificu.edu/shereen/research.htm Direct download: http://zeus.cs.pacificu.edu/shereen/ArabicStemmerCode.zip https://sourceforge.net/projects/arabicstemmer/

Base word stemming instead of root word stemming in R

你说的曾经没有我的故事 提交于 2019-12-03 02:26:00
Is there any way to get base word instead of root word in stemming using NLP in R? Code: > #Loading libraries > library(tm) > library(slam) > > #Vector > Vec=c("happyness happies happys","sky skies") > > #Creating Corpus > Txt=Corpus(VectorSource(Vec)) > > #Stemming > Txt=tm_map(Txt, stemDocument) > > #Checking result > inspect(Txt) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID [[1]] happi happi happi [[2]] sky sky > Can I get base word "happy" (base word)

nltk stemmer: string index out of range

回眸只為那壹抹淺笑 提交于 2019-12-03 01:58:06
I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results(request): s = PorterStemmer() s.stem('oed') return render(request, 'list.html') raises the mentioned error: