fuzzy-search

Find actual matching word when using fuzzy query in elastic search

爷,独闯天下 提交于 2019-12-05 10:00:55
I am new to elasticsearch and was looking around fuzzy query search. I have made a new index products with object/record values like this { "_index": "products", "_type": "product", "_id": "10", "_score": 1, "_source": { "value": [ "Ipad", "Apple", "Air", "32 GB" ] } } Now when i am performing a fuzzy query search in elasticsearch like { query: { fuzzy: { value: "tpad" } } } It returns me the correct record (the product just made above) which is expected. And i know that the term tpad matches ipad so record was return. But technically how would i know that it has matched ipad . Elastic search

How to do fuzzy string search without a heavy database?

浪尽此生 提交于 2019-12-05 07:56:54
I have a mapping of catalog numbers to product names: 35 cozy comforter 35 warm blanket 67 pillow and need a search that would find misspelled, mixed names like "warm cmfrter" . We have code using edit-distance (difflib), but it probably won't scale to the 18000 names. I achieved something similar with Lucene, but as PyLucene only wraps Java that would complicate deployment to end-users. SQLite doesn't usually have full-text or scoring compiled in. The Xapian bindings are like C++ and have some learning curve. Whoosh is not yet well-documented but includes an abusable spell-checker. What else

Fuzzy Search on a Concatenated Full Name using NHibernate

两盒软妹~` 提交于 2019-12-05 07:01:26
I am trying to convert the following SQL into NHibernate: SELECT * FROM dbo.Customer WHERE FirstName + ' ' + LastName LIKE '%' + 'bob smith' + '%' I was trying to do something like this but it is not working: name = "%" + name + "%"; var customers = _session.QueryOver<Customer>() .Where(NHibernate.Criterion.Restrictions.On<Customer>(c => c.FirstName + ' ' + c.LastName).IsLike(name)) .List(); What I'm basically trying to do is be able to search for a customer's name in a text box with the example value of "bob smith" and for it to search the database using the LIKE expression in the SQL above.

Find a series of data using non-exact measurements (fuzzy logic)

不羁的心 提交于 2019-12-05 05:47:50
This is a more complex follow-up question to: Efficient way to look up sequential values Each Product can have many Segment rows (thousands). Each segment has position column that starts at 1 for each product (1, 2, 3, 4, 5, etc.) and a value column that can contain any values such as (323.113, 5423.231, 873.42, 422.64, 763.1, etc.). The data is read-only. It may help to think of the product as a song and the segments as a set of musical notes in the song. Given a subset of contiguous segments, like a snippet of a song, I would like to identify potential matches for products. However, due to

How to get Lucene Fuzzy Search result 's matching terms?

跟風遠走 提交于 2019-12-05 02:09:13
问题 how do you get the matching fuzzy term and its offset when using Lucene Fuzzy Search? IndexSearcher mem = ....(some standard code) QueryParser parser = new QueryParser(Version.LUCENE_30, CONTENT_FIELD, analyzer); TopDocs topDocs = mem.search(parser.parse("wuzzy~"), 1); // the ~ triggers the fuzzy search as per "Lucene In Action" The fuzzy search works fine. If a document contains the term "fuzzy" or "luzzy", it is matched. How do I get which term matched and what are their offsets? I have

Elastic search fuzzy match with exact matches showing first

不羁的心 提交于 2019-12-05 01:49:46
问题 I am wanting to use fuzzy matching on a query but with exact matches showing at the top of the results. I've tried the following. $return = $this->_client->search( array( 'index' => self::INDEX, 'type' => self::TYPE, 'body' => array( 'query' => array( 'bool' => array( 'must' => array( 'multi_match' => array( 'query' => $query, 'fields' => array('name', 'brand', 'description'), 'boost' => 10, ), 'fuzzy_like_this' => array( 'like_text' => $query, 'fields' => array('name', 'brand', 'description'

Fuzzy text search in python

江枫思渺然 提交于 2019-12-04 23:09:33
问题 I am wondering if there has any Python library can conduct fuzzy text search. For example: I have three keywords "letter" , "stamp" , and "mail" . I would like to have a function to check if those three words are within the same paragraph (or certain distances, one page). In addition, those words have to maintain the same order. It is fine that other words appear between those three words. I have tried fuzzywuzzy which did not solve my problem. Another library Whoosh looks powerful, but I did

Fuzzy sentence search algorithms

柔情痞子 提交于 2019-12-04 12:44:14
Suppose I have a set of phrases - about 10 000 - of average length - 7-20 words in which I want to find some given phrase. The phrase I am looking for could have some errors - for example miss one or two words, have some words misplaced, or some random words - for example my database contains "As I was riding my red bike, I saw Christine", and I want it to much "As I was riding my blue bike, saw Christine", or "I was riding my bike, I saw Christine and Marion". What could be some good approach to this problem? I know about Levenhstein's distance, and I also suppose that this problem may have

Lucene Fuzzy Search for customer names and partial address

时光毁灭记忆、已成空白 提交于 2019-12-04 06:09:47
I was going thru all the existing questions posts but couldn't get something much relevant. I have file with millions of records for person first name, last name, address1, address2, country code, date of birth - I would like to check my list of customers with above file on daily basis (my customer list also get updated daily and file also gets updated daily). For first name and last name I would like fuzzy match (may be lucene fuzzyquery/levenshtein distance 90% match) and for remaining fields country and date of birth I wanted exact match. I am new to Lucene, but by looking at number of

Fuzzy regex (e.g. {e<=2}) correct usage in Python

本小妞迷上赌 提交于 2019-12-04 03:43:29
问题 I am trying to find strings which are at most two mistakes 'away' from the original pattern string (i.e. they differ by at most two letters). However, the following code isn't working as I would expect, at least not from my understanding of fuzzy regex: import regex res = regex.findall("(ATAGGAGAAGATGATGTATA){e<=2}", "ATAGAGCAAGATGATGTATA", overlapped=True) print res >> ['ATAGAGCAAGATGATGTATA'] # the second string As you can see, the two strings differ on three letters rather than at most two