stop-words

SQL Contains() not returning results for 'The'

删除回忆录丶 提交于 2021-02-04 21:10:50
问题 I have SQL script as below for querying my ContactInfoes table SELECT * FROM ContactInfoes WHERE CONTAINS(Name, 'The') I am getting only empty result set. I have an entry in my table with Name 'The Company'. Why I am not getting any data here and how this can be resolved. Any help is appreciated. I am using SQL Server 2019 回答1: You have created FULLTEXT index without specifying STOPLIST. Thus, the default STOPLIST was used. By default the word 'the' is the stop word, that removed from your

Full text search does not work if stop word is included even though stop word list is empty

孤街醉人 提交于 2021-01-29 12:49:24
问题 I would like to be able to search every word so I have cleared the stop word list. Than I have rebuilt the index. But unfortunately if I type in a search expression with stop word in it it still returns no row. If I leave out just the stop word I do get the results. E.g. "double wear stay in place" - no result, "double wear stay place" - I get the results that actually contain "in" as well. Does anyone know why this can be? I am using SQL Server 2012 Express. Thanks a lot! 回答1: Meanwhile I

Using default and custom stop words with Apache's Lucene (weird output)

对着背影说爱祢 提交于 2020-12-26 10:21:37
问题 I'm removing stop words from a String, using Apache's Lucene (8.6.3) and the following Java 8 code: private static final String CONTENTS = "contents"; final String text = "This is a short test! Bla!"; final List<String> stopWords = Arrays.asList("short","test"); final CharArraySet stopSet = new CharArraySet(stopWords, true); try { Analyzer analyzer = new StandardAnalyzer(stopSet); TokenStream tokenStream = analyzer.tokenStream(CONTENTS, new StringReader(text)); CharTermAttribute term =

Using default and custom stop words with Apache's Lucene (weird output)

会有一股神秘感。 提交于 2020-12-26 10:20:36
问题 I'm removing stop words from a String, using Apache's Lucene (8.6.3) and the following Java 8 code: private static final String CONTENTS = "contents"; final String text = "This is a short test! Bla!"; final List<String> stopWords = Arrays.asList("short","test"); final CharArraySet stopSet = new CharArraySet(stopWords, true); try { Analyzer analyzer = new StandardAnalyzer(stopSet); TokenStream tokenStream = analyzer.tokenStream(CONTENTS, new StringReader(text)); CharTermAttribute term =

Why are stop words not being excluded from the word cloud when using Python's wordcloud library?

妖精的绣舞 提交于 2020-06-28 04:04:43
问题 I want to exclude 'The', 'They' and 'My' from being displayed in my word cloud. I'm using the python library 'wordcloud' as below, and updating the STOPWORDS list with these 3 additional stopwords, but the wordcloud is still including them. What do I need to change so that these 3 words are excluded? The libraries I imported are: import numpy as np import pandas as pd from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt I've tried adding elements to the STOPWORDS set at

Why are stop words not being excluded from the word cloud when using Python's wordcloud library?

倾然丶 夕夏残阳落幕 提交于 2020-06-28 04:03:54
问题 I want to exclude 'The', 'They' and 'My' from being displayed in my word cloud. I'm using the python library 'wordcloud' as below, and updating the STOPWORDS list with these 3 additional stopwords, but the wordcloud is still including them. What do I need to change so that these 3 words are excluded? The libraries I imported are: import numpy as np import pandas as pd from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt I've tried adding elements to the STOPWORDS set at

What is efficient way to check if current word is close to a word in string?

久未见 提交于 2020-06-16 17:24:35
问题 consider examples below : Example 1 : str1 = "wow...it looks amazing" str2 = "looks amazi" You see that amazi is close to amazing , str2 is mistyped, i wanted to write a program that will tell me that amazi is close to amazing then in str2 i will replace amazi with amazing Example 2 : str1 = "is looking good" str2 = "looks goo" In this case updated str2 will be "looking good" Example 3 : str1 = "you are really looking good" str2 = "lok goo" In this case str2 will be "good" as lok is not close

mysql Modify stopword list for fulltext search

旧巷老猫 提交于 2020-01-24 10:16:25
问题 I've searched a lot, it's said that I have to edit my.cnf file to change the stopword list. I renamed my-medium.cnf to my.cnf and added the ft_query_expansion_limit and ft_stopword_file conditions. I have restarted mySQL. But it is not taking effect. I dont have admin privileges. # The MySQL server [mysqld] port = 3306 socket = /tmp/mysql.sock skip-external-locking key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512K net_buffer_length = 8K read_buffer

Filter out common words for search query

允我心安 提交于 2020-01-13 19:21:11
问题 Are there any easy ways to implement filtering a user's input (possibly a question) by extracting the meaningful data in the query? I basically want to filter out any noise words so I can send a 'clean' query to Google's search api. 回答1: Um, won't Google do this for you? Send all those dirty, filthy words to Google and let them clean them up for you. 回答2: Jeff talked about "stop words" in one of the previous stackoverflow podcasts. You might try searching for that phrase on google. The

How to select stop words using tf-idf? (non english corpus)

戏子无情 提交于 2020-01-11 20:01:10
问题 I have managed to evaluate the tf-idf function for a given corpus. How can I find the stopwords and the best words for each document? I understand that a low tf-idf for a given word and document means that it is not a good word for selecting that document. 回答1: Stop-words are those words that appear very commonly across the documents, therefore loosing their representativeness. The best way to observe this is to measure the number of documents a term appears in and filter those that appear in