SQL Server Full-Text Search against Document (multiple related tables and field)

微笑、不失礼 提交于 2019-12-12 02:46:10

问题


I have a document (in this case Invoice) structure which contains multiple tables:

  • Invoice Header (No. (PK), Customer Name, Customer Address, ...)

  • Invoice Lines (Invoice No. (PK), Line No. (PK), Description, Qty., ...)

  • Invoice Header Comments (Invoice No. (PK), Comment No. (PK), Comment)

When I run a search I would like to execute it against whole document (as one entity, not against separate fields (Customer Name + Customer Address + Description + Comment).

Example: All documents which have something to do with "Bicycle AND Berlin" or "Munich OR Berlin" or "'Fast delivery'"....

What approach would you recommend to solve this problem?

Should I create a separate Index table to store concatenated values from all field which I would like to index (Customer Name, Customer address, Description, Comment) - one row per document:

Document Index (Document No. (PK), Index) In this case how should I keep "Document Index" table up to date?

I tried to create indexed views which concatenate values, but got to the limitation - indexed view can't contain subselects or use other views.

I would appreciate all ideas.


回答1:


SQL Full-Text search would be most appropriate method, given your requirements of boolean search, multiple columns and tables.

The process is broken into steps, but roughly, you will need to:

  1. Create a Full Text Catalog
  2. Create a Full Text Index for each of the tables
  3. Generate/build the index
  4. Finally, using the FT (catalog) in your queries

I would highly recommend starting with the Getting Started article, it will help you understand some of the jargon, structure and how to manage and use full-text within SQL server.




回答2:


If you need to rank (score) or sort your search results, you should create a new table which, through an ETL process, combines all of the full-text-searchable data (invoice header, lines, comments) for your entity into 1 column. This seems to be what you're suggesting with your "Document Index" table idea.

Why combine them into 1 table? This approach results in better ranking than if you were to apply full text indexes to each existing table. The former solution produces a single rank whereas the latter will produce a different rank for each table and there is no accurate way to resolve multiple ranks (which are based on completely different scales) into 1 rank. To illustrate the differences:

-- Querying 1 table
SELECT RANK, KEY FROM CONTAINSTABLE(DocumentIndex.*, @searchString)

-- Querying multiple tables (this results in multiple rank values which cannot be resolved into a single rank)
SELECT RANK, KEY FROM CONTAINSTABLE(InvoiceHeader.*, @searchString)

SELECT RANK, KEY FROM CONTAINSTABLE(InvoiceLines.*, @searchString)

SELECT RANK, KEY FROM CONTAINSTABLE(InvoiceHeaderComments.*, @searchString)

How can you combine them into 1 table? You will need some sort of ETL process which either runs on a schedule (which may be easier to implement but will result in lag time where your full text index is out of sync with the master tables) or gets run on demand whenever your master tables are modified (either via triggers or by hooking into an event in your data layer).



来源:https://stackoverflow.com/questions/33466781/sql-server-full-text-search-against-document-multiple-related-tables-and-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!