phrase | 易学教程

How to search phrase queries in inverted index structure?

阅读更多关于 How to search phrase queries in inverted index structure?

问题 If we want to search a query like this "t1 t2 t3" (t1,t2 ,t3 must be queued) in an inverted index structure , which ways should we do ? 1-First we search the "t1" term and find all documents that contains "t1" , then do this work for "t2" and then "t3" . Then find documents that positions of "t1" , "t2" and "t3" are next to each other . 2-First we search the "t1" term and find all documents that contains "t1" , then in all documents that we found , we search the "t2" and next , in the result

Creating more complex regexes from TAG format

阅读更多关于 Creating more complex regexes from TAG format

问题 So I can't figure out what's wrong with my regex here. (The original conversation, which includes an explanation of these TAG formats, can be found here: Translate from TAG format to Regex for Corpus). I am starting with a string like this: Arms_NNS folded_VVN ,_, The NNS could also NN, and the VVN could also be VBG. And I just want to find that and other strings with the same tags (NNS or NN followed b VVN or VBG followed by comma). The following regex is what I am trying to use, but it is

How to get phrase tags in Stanford CoreNLP?

阅读更多关于 How to get phrase tags in Stanford CoreNLP?

问题 If I want to get phrase tags corresponding each word, how to I get this? For example : In this sentence, My dog also likes eating sausage. I can get a parse tree in Stanford NLP such as (ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .))) In the above situtation, I want to get phrase tags corresponding each word like (My - NP), (dog - NP), (also - ADVP), (likes - VP), ... Is there any method for this simple extraction for phrase tags? Please

Counting phrases in Python using NLTK

阅读更多关于 Counting phrases in Python using NLTK

问题 I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in the text file. Phrases can be defined/grouped by using logic from NLTK from my understanding. I believe the collections function is what I need to obtain the desired result, but I'm not sure how to go about implementing it from reading the NLTK documentation. Any tips/help would be greatly

Solr: exact phrase query with a EdgeNGramFilterFactory

阅读更多关于 Solr: exact phrase query with a EdgeNGramFilterFactory

问题 In Solr (3.3), is it possible to make a field letter-by-letter searchable through a EdgeNGramFilterFactory and also sensitive to phrase queries? By example, I'm looking for a field that, if containing "contrat informatique", will be found if the user types: contrat informatique contr informa "contrat informatique" "contrat info" Currently, I made something like this: <fieldtype name="terms" class="solr.TextField"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory"

XML > jQuery reading

阅读更多关于 XML > jQuery reading

问题 How can i read this XML File with jQuery? With "normal tags" its no problem like: <car>Mustang</car> HTML/jQuery: $(document).ready(function(){ $.get("AMA.xml", function(XMLArray){ $(XMLArray).find("dataset").each(function(){ var $myAMA = $(this); var number = $myAMA.attr("article.plunumber"); var name = $myAMA.attr("article.name"); var price = $myAMA.attr("article.price").text(); $("#AMAContainer").append("<p>"+number+"<br>"+name+"<br>"+price+"</p>"); }); }); }); XML File: <document name=

Searching phrases in Lucene

阅读更多关于 Searching phrases in Lucene

问题 Could somebody point me to an example how to search for phrases with Lucene.net? Let's say I have in my index a document with field "name", value "Jon Skeet". Now I want to be able to find that document when searching for "jon skeet". 回答1: You can use a proximity search to find terms within a certain distance of each other. The Lucene query syntax looks like this "jon skeet"~3 , meaning find "jon" and "skeet" within three words of each other. With this syntax, relative order doesn't matter;

Java: Matching Phrases in a String

阅读更多关于 Java: Matching Phrases in a String

问题 I have a list of phrases (phrase might consist of one or more words) in a database and an input string. I need to find out which of those phrases appear in the input string. Is there an efficient way to perform such matching in Java? 回答1: A quick hack would be: Build a regexp based on the combined phrases Construct a set listing the phrases that haven't matched so far Repeatedly run find until all phrases have been found or end of input is reached, removing matches from the set of remaining

Elasticsearch - Fuzzy, phrase, completion suggestor and dashes

阅读更多关于 Elasticsearch - Fuzzy, phrase, completion suggestor and dashes

问题 So I have been asking separate questions trying to achieve the search functionality I would like to achieve but still falling short so thought I would just ask people what they suggest for the optimal Elasticsearch settings, mappings, indexing and query structure to do what I am looking for. I need a search as you type solution that queries categories. If I typed in "mex" I am looking to get back results like "Mexican Restaurant", "Mexican Grocery Store", "Tex-Mex Restaurant" and "Medical

Counting phrases in Python using NLTK

阅读更多关于 Counting phrases in Python using NLTK

I am trying to get a phrase count from a text file but so far I am only able to obtain a word count (see below). I need to extend this logic to count the number of times a two-word phrase appears in the text file. Phrases can be defined/grouped by using logic from NLTK from my understanding. I believe the collections function is what I need to obtain the desired result, but I'm not sure how to go about implementing it from reading the NLTK documentation. Any tips/help would be greatly appreciated. import re import string frequency = {} document_text = open('Words.txt', 'r') text_string =