stemming | 易学教程

One word phrase search to avoid stemming in Solr

阅读更多关于 One word phrase search to avoid stemming in Solr

问题 I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case? Is there a simple way to achieve this? 回答1: There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here). For example, if I perform this search: q=field_name:determine I see

SnowballStemmer for Russian words list

阅读更多关于 SnowballStemmer for Russian words list

I do know how to perform SnowballStemmer on a single word (in my case, on russian one). Doing the next things: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("russian") stemmer.stem("Василий") 'Васил' How can I do the following if I have a list of words like ['Василий', 'Геннадий', 'Виталий']? My approach using for loop seems to be not working :( l=[stemmer.stem(word) for word in l] Your variable l is not pre-defined, causing the name error. See my last two lines for fix. >>> from nltk.stem.snowball import SnowballStemmer >>> stemmer = SnowballStemmer("russian") >>>

Exact word search in Solr

阅读更多关于 Exact word search in Solr

I have a question which closely relates to this question . In my schema I have a field <field name="text" type="textgen" indexed="true" stored="true" required="true"/> This gives an exact match, ie. stemming disabled eat = eat Is it possible, while configured to textgen to search for other variants of the word eg. eat = eat, eats, eating eat~0 will give similar sounding words such as meat, beat etc. but this is not what I want. I'm starting to think that the only way to achieve this is to add another field with something other then textgen but if there is a simpler way I am very interested to

One word phrase search to avoid stemming in Solr

阅读更多关于 One word phrase search to avoid stemming in Solr

I have stemming enabled in my Solr instance, I had assumed that in order to perform an exact word search without disabling stemming, it would be as simple as putting the word into quotes. This however does not appear to be the case? Is there a simple way to achieve this? There is a simple way, if what you're referring to is the "slop" (required similarity) as part of a fuzzy search (see the Lucene Query Syntax here). For example, if I perform this search: q=field_name:determine I see results that contain "determine", "determining", "determined", etc.. If I then modify the query like so: q

Base word stemming instead of root word stemming in R

阅读更多关于 Base word stemming instead of root word stemming in R

问题 Is there any way to get base word instead of root word in stemming using NLP in R? Code: > #Loading libraries > library(tm) > library(slam) > > #Vector > Vec=c("happyness happies happys","sky skies") > > #Creating Corpus > Txt=Corpus(VectorSource(Vec)) > > #Stemming > Txt=tm_map(Txt, stemDocument) > > #Checking result > inspect(Txt) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data

nltk stemmer: string index out of range

阅读更多关于 nltk stemmer: string index out of range

问题 I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results

Python stemming (with pandas dataframe)

阅读更多关于 Python stemming (with pandas dataframe)

I came to the following problem while programming Python: I use a Pandas dataframe containing words that have to stemmed (using SnowballStemmer). I want the words to be stemmed to investigate the results for stemmed vs non stemmed text and for this I will be using a classifier. I use the following code for the stemmer: from nltk.stem.snowball import SnowballStemmer stemmer = SnowballStemmer("dutch") I want to stem all separate words in the list while remaining the order and keeping every key with every value. This is the column from the Pandas dataframe from which I want every separate word

I want a Java Arabic stemmer

阅读更多关于 I want a Java Arabic stemmer

I'm looking for a Java stemmer for Arabic. I found a lib called "AraMorph" , but its output is uncontrollable and it makes formation to words which is unwanted. Is there any other stemmer for Arabic ? Here is new Arabic stemmer: Assem's Arabic light stemmer coded using Snowball framework and generated to many languages including Java. You can use it by downloading libstemmer for Java here . You can find Kohja stemmer here: http://zeus.cs.pacificu.edu/shereen/research.htm Direct download: http://zeus.cs.pacificu.edu/shereen/ArabicStemmerCode.zip https://sourceforge.net/projects/arabicstemmer/

Base word stemming instead of root word stemming in R

阅读更多关于 Base word stemming instead of root word stemming in R

Is there any way to get base word instead of root word in stemming using NLP in R? Code: > #Loading libraries > library(tm) > library(slam) > > #Vector > Vec=c("happyness happies happys","sky skies") > > #Creating Corpus > Txt=Corpus(VectorSource(Vec)) > > #Stemming > Txt=tm_map(Txt, stemDocument) > > #Checking result > inspect(Txt) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID [[1]] happi happi happi [[2]] sky sky > Can I get base word "happy" (base word)

nltk stemmer: string index out of range

阅读更多关于 nltk stemmer: string index out of range

I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer . For reasons specific to my project, I would like to do the stemming inside of a django app view. However, when stemming the documents inside the django view, I receive an IndexError: string index out of range exception from PorterStemmer().stem() for the string 'oed' . As a result, running the following: # xkcd_project/search/views.py from nltk.stem.porter import PorterStemmer def get_results(request): s = PorterStemmer() s.stem('oed') return render(request, 'list.html') raises the mentioned error: