phrase

How to search phrase queries in inverted index structure?

我只是一个虾纸丫 提交于 2020-01-22 15:24:35
问题 If we want to search a query like this "t1 t2 t3" (t1,t2 ,t3 must be queued) in an inverted index structure , which ways should we do ? 1-First we search the "t1" term and find all documents that contains "t1" , then do this work for "t2" and then "t3" . Then find documents that positions of "t1" , "t2" and "t3" are next to each other . 2-First we search the "t1" term and find all documents that contains "t1" , then in all documents that we found , we search the "t2" and next , in the result

Creating more complex regexes from TAG format

无人久伴 提交于 2020-01-06 14:07:57
问题 So I can't figure out what's wrong with my regex here. (The original conversation, which includes an explanation of these TAG formats, can be found here: Translate from TAG format to Regex for Corpus). I am starting with a string like this: Arms_NNS folded_VVN ,_, The NNS could also NN, and the VVN could also be VBG. And I just want to find that and other strings with the same tags (NNS or NN followed b VVN or VBG followed by comma). The following regex is what I am trying to use, but it is

How to get phrase tags in Stanford CoreNLP?

假如想象 提交于 2019-12-31 01:50:10
问题 If I want to get phrase tags corresponding each word, how to I get this? For example : In this sentence, My dog also likes eating sausage. I can get a parse tree in Stanford NLP such as (ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .))) In the above situtation, I want to get phrase tags corresponding each word like (My - NP), (dog - NP), (also - ADVP), (likes - VP), ... Is there any method for this simple extraction for phrase tags? Please

Counting phrases in Python using NLTK

我的梦境 提交于 2019-12-22 01:28:00
问题 I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in the text file. Phrases can be defined/grouped by using logic from NLTK from my understanding. I believe the collections function is what I need to obtain the desired result, but I'm not sure how to go about implementing it from reading the NLTK documentation. Any tips/help would be greatly

Solr: exact phrase query with a EdgeNGramFilterFactory

淺唱寂寞╮ 提交于 2019-12-20 10:47:02
问题 In Solr (3.3), is it possible to make a field letter-by-letter searchable through a EdgeNGramFilterFactory and also sensitive to phrase queries? By example, I'm looking for a field that, if containing "contrat informatique", will be found if the user types: contrat informatique contr informa "contrat informatique" "contrat info" Currently, I made something like this: <fieldtype name="terms" class="solr.TextField"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory"

XML > jQuery reading

雨燕双飞 提交于 2019-12-18 07:22:57
问题 How can i read this XML File with jQuery? With "normal tags" its no problem like: <car>Mustang</car> HTML/jQuery: $(document).ready(function(){ $.get("AMA.xml", function(XMLArray){ $(XMLArray).find("dataset").each(function(){ var $myAMA = $(this); var number = $myAMA.attr("article.plunumber"); var name = $myAMA.attr("article.name"); var price = $myAMA.attr("article.price").text(); $("#AMAContainer").append("<p>"+number+"<br>"+name+"<br>"+price+"</p>"); }); }); }); XML File: <document name=

Searching phrases in Lucene

纵然是瞬间 提交于 2019-12-18 03:45:16
问题 Could somebody point me to an example how to search for phrases with Lucene.net? Let's say I have in my index a document with field "name", value "Jon Skeet". Now I want to be able to find that document when searching for "jon skeet". 回答1: You can use a proximity search to find terms within a certain distance of each other. The Lucene query syntax looks like this "jon skeet"~3 , meaning find "jon" and "skeet" within three words of each other. With this syntax, relative order doesn't matter;

Java: Matching Phrases in a String

时光总嘲笑我的痴心妄想 提交于 2019-12-12 18:26:18
问题 I have a list of phrases (phrase might consist of one or more words) in a database and an input string. I need to find out which of those phrases appear in the input string. Is there an efficient way to perform such matching in Java? 回答1: A quick hack would be: Build a regexp based on the combined phrases Construct a set listing the phrases that haven't matched so far Repeatedly run find until all phrases have been found or end of input is reached, removing matches from the set of remaining

Elasticsearch - Fuzzy, phrase, completion suggestor and dashes

戏子无情 提交于 2019-12-10 20:37:25
问题 So I have been asking separate questions trying to achieve the search functionality I would like to achieve but still falling short so thought I would just ask people what they suggest for the optimal Elasticsearch settings, mappings, indexing and query structure to do what I am looking for. I need a search as you type solution that queries categories. If I typed in "mex" I am looking to get back results like "Mexican Restaurant", "Mexican Grocery Store", "Tex-Mex Restaurant" and "Medical

Counting phrases in Python using NLTK

我只是一个虾纸丫 提交于 2019-12-04 21:40:26
I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in the text file. Phrases can be defined/grouped by using logic from NLTK from my understanding. I believe the collections function is what I need to obtain the desired result, but I'm not sure how to go about implementing it from reading the NLTK documentation. Any tips/help would be greatly appreciated. import re import string frequency = {} document_text = open('Words.txt', 'r') text_string =