lucene | 易学教程

Lucene QueryParse discards " when parsing

阅读更多关于 Lucene QueryParse discards " when parsing

问题 I have a query -license:"CC-BY-NC" AND -license:"CC-BY-ND 4.0 (Int)" to be passed into PrecedenceQueryParser.parse like this: Query query = new PrecedenceQueryParser().parse(filter, '') But in the generated query you can see, clauses are like -lincense:CC-BY-NC , "" are lost. Is there any settings to keep the ""? ===================== UPDATE =========================== I understand that since I'm looking for a match of CC-BY-ND 4.0 (Int), without double quotes (double quotes are just used to

How to index a inherited field in Hibernate-search?

阅读更多关于 How to index a inherited field in Hibernate-search?

问题 I am working in a java jpa Hibernate-search application, I know Hibernate-search index automatically every @Id annotation in an entity. The problem is that I have a "master domain" class with contains the @Id annotation, and then I have another class with inherit "master domain", then seems to be the Hibernate search is not recognizing the @Id field inherited. this is my master domain class. @MappedSuperclass @Inheritance(strategy = InheritanceType.JOINED) public abstract class MasterDomain

Build Lucene Query for multi values in one field

阅读更多关于 Build Lucene Query for multi values in one field

问题 I have one field and multiple values for it and I am trying to build a simple query which should look like this field:(value1 value2 value3) I have a map with fields and values and I am doing something like this fieldsMap "field1" -> "[data1]" "field2" -> "[value1,value2,value3]" Code to build lucene query: fieldsMap .entrySet() .forEach(field -> { try { QueryParser queryParser = new ComplexPhraseQueryParser(field.getKey(), new StandardAnalyzer()); booleanQueryBuilder.add(queryParser.parse

高性能 Nginx HTTPS 调优！为 HTTPS 提速 30%

阅读更多关于高性能 Nginx HTTPS 调优！为 HTTPS 提速 30%

点击上方“ 民工哥技术之路 ”，选择“设为星标” 回复“ 1024 ”获取独家整理的学习资料！为什么要优化 Ngin HTTPS 延迟 Nginx 常作为最常见的服务器，常被用作负载均衡 (Load Balancer)、反向代理 (Reverse Proxy)，以及网关 (Gateway) 等等。一个配置得当的 Nginx 服务器单机应该可以期望承受住 50K 到 80K 左右每秒的请求，同时将 CPU 负载在可控范围内。但在很多时候，负载并不是需要首要优化的重点。比如对于卡拉搜索来说，我们希望用户在每次击键的时候，可以体验即时搜索的感觉，也就是说，每个搜索请求必须在 100ms - 200ms 的时间内端对端地返回给用户，才能让用户搜索时没有“卡顿”和“加载”。因此，对于我们来说，优化请求延迟才是最重要的优化方向。这篇文章中，我们先介绍 Nginx 中的 TLS 设置有哪些与请求延迟可能相关，如何调整才能最大化加速。然后我们用优化卡拉搜索Nginx 服务器的实例来分享如何调整 Nginx TLS/SSL 设置，为首次搜索的用户提速 30% 左右。我们会详细讨论每一步我们做了一些什么优化，优化的动机和效果。希望可以对其它遇到类似问题的同学提供帮助。 TLS 握手和延迟很多时候开发者会认为：如果不是绝对在意性能，那么了解底层和更细节的优化没有必要。这句话在很多时候是恰当的

Query fields in Kibana with RegEx

阅读更多关于 Query fields in Kibana with RegEx

问题 I need to search in Kibana Logs for fields with a specific content. The field is "message", that looks like this: 11.111.72.58 - - [26/Nov/2020:08:44:23 +0000] "GET /images/image.jpg HTTP/1.1" 200 123456 "https://website.com/questionnaire/uuid/result" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.14 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.14" "5.158.163.231" This field contains URIs, for example here "https://website.com/questionnaire/uuid/result". How can I

Efficient low-cardinality ANDs in a search engine

阅读更多关于 Efficient low-cardinality ANDs in a search engine

问题 How do search engines such as Lucene, etc. perform AND queries where a term is common to many documents in the dataset? For example, in an inverted index of: term | document_id --------------------- program | 1, 2, 3, 5... python | 1, 4 code | 4 c++ | 4, 5 the term program is present in several documents meaning a query of program AND code would require performing an intersection upon a very large set of documents. Is there a way to perform AND queries without having to take the intersection

高性能 Nginx HTTPS 调优！为 HTTPS 提速 30%

阅读更多关于高性能 Nginx HTTPS 调优！为 HTTPS 提速 30%

为什么要优化 Ngin HTTPS 延迟 Nginx 常作为最常见的服务器，常被用作负载均衡 (Load Balancer)、反向代理 (Reverse Proxy)，以及网关 (Gateway) 等等。一个配置得当的 Nginx 服务器单机应该可以期望承受住 50K 到 80K 左右每秒的请求，同时将 CPU 负载在可控范围内。但在很多时候，负载并不是需要首要优化的重点。比如对于卡拉搜索来说，我们希望用户在每次击键的时候，可以体验即时搜索的感觉，也就是说，每个搜索请求必须在 100ms - 200ms 的时间内端对端地返回给用户，才能让用户搜索时没有“卡顿”和“加载”。因此，对于我们来说，优化请求延迟才是最重要的优化方向。这篇文章中，我们先介绍 Nginx 中的 TLS 设置有哪些与请求延迟可能相关，如何调整才能最大化加速。然后我们用优化卡拉搜索Nginx 服务器的实例来分享如何调整 Nginx TLS/SSL 设置，为首次搜索的用户提速 30% 左右。我们会详细讨论每一步我们做了一些什么优化，优化的动机和效果。希望可以对其它遇到类似问题的同学提供帮助。 TLS 握手和延迟很多时候开发者会认为：如果不是绝对在意性能，那么了解底层和更细节的优化没有必要。这句话在很多时候是恰当的，因为很多时候复杂的底层逻辑必须包起来，才能让更高层的应用开发复杂度可控。比如说

Optimize Lucene for compression ratio

阅读更多关于 Optimize Lucene for compression ratio

问题 I have a use case for Lucene in which the search types required are very simple. I'll likely use DOCS_ONLY indexing with no stored fields or any complicated add-ons. The documents are unstructured English text. For this use case the most important thing to optimize is the compression ratio of the original documents to the on-disk size of the index. The Lucene index should be as small as possible, even at the expense of increased search and update latency. I'm wondering how I should configure

Get all stored fields from lucene index using java

阅读更多关于 Get all stored fields from lucene index using java

问题 I want to show the words stored in Lucene index so that user can select the word and get corresponding documents. I am new to Lucene. Any help is appreciated. 回答1: The issue is that there is no magic getAllStoredFields() function in Lucene. Lucene stores fields in documents which are then stored in an index, every document in the index can have different fields containing stored fields. You need to retrieve one specific document Like: Document doc = indexReader.document(docNum); and call doc

how does lucene process dots ('.') in StringField? (issue indexing and searching file names)

阅读更多关于 how does lucene process dots ('.') in StringField? (issue indexing and searching file names)

问题 I have a simple question I was not able to answer searching around or searching other questions: I am indexing a field which contains a filename with the following code: doc.add(new TextField(FIELD_FILENAME, filename, Field.Store.YES)) if I index hello.jpg and then I search with the key 'hello.jpg' the entry is hit (so far so good). However, if I search with 'hello' I get no hits. If I replace '.' with another punctuation character while indexing then it works. If I escape the '.' it works as

订阅 lucene