Recursive Relationship on Dictionary Table

问题

I'm working on a poor, but ok for us, full-text search using only PSQL in Firebird. I'll try to simplify as much as possible by focusing on my problem:

Summing up, this a dictionary table:

SELECT * FROM FTS_KEYWORDS

 ID | KEYWORD
----+-----------
  1 | 'FORD'
  1 | 'MUSTANG'
  1 | '2010'
  2 | 'FORD'
  2 | 'FUSION'
  2 | 'TURBO'
  2 | '2010'
  3 | 'FORD'
  3 | 'RANGER'
  3 | 'TURBO'
  3 | '2010'
  3 | 'BLACK'

There is too a FTS_TOKENIZE() procedure to get the words from the whole strings

Case 1: User search with 1 keyword

SELECT TOKENS FROM FTS_TOKENIZE('FORD')

 TOKENS
-------------
  'FORD'

This would then be the SQL required to get the correct results:

:TOKEN_1 = 'FORD'

SELECT DISTINCT ID
FROM FTS_KEYWORDS
WHERE (KEYWORD STARTING :TOKEN_1)

 ID 
-----
  1
  2 
  3

Case 2: User search with 3 keywords

SELECT TOKENS FROM FTS_TOKENIZE('FORD 2010 BLACK')

 TOKENS
-------------
 'FORD'
 '2010'
 'BLACK'

So, SQL to retrieve the correct values:

:TOKEN_1 = 'FORD'
:TOKEN_2 = '2010'
:TOKEN_3 = 'BLACK'

SELECT DISTINCT K1.ID
FROM FTS_KEYWORDS K1
WHERE (K1.KEYWORD STARTING :TOKEN_1)
  AND (K1.ID IN (SELECT DISTINCT K2.ID
                 FROM FTS_KEYWORDS K2
                 WHERE (K2.KEYWORD STARTING :TOKEN_2)))
                   AND (K2.ID IN (SELECT DISTINCT K3.ID
                                  FROM FTS_KEYWORDS K3
                                  WHERE (K3.KEYWORD STARTING :TOKEN_3)))

 ID 
-----
  3

ID 3 is the only ID that has all the keywords matching the search.

The SQL to retrieve values is a recursive nested by the tokens amount user query search.

Currently, in a procedure FTS_SEARCH(), I build a SQL string and use then in an EXECUTE STATEMENT way, but I do not think this is ideal.

I think this can be done with recursive Common Table Expressions (“WITH ... AS ... SELECT”), but I was not able to do it, because, based on the current examples available, it requires a table with Parent_ID and does not accept input parameters, which is not my case.

My question is: Is there a way to do this search in a recursive way using CTE or other SQL trick?

回答1:

Instead of using a recursive CTE, you could put your list of tokens into a table (CRITERIA), join that table with FTS_KEYWORDS on KEYWORD, group by ID and count the number of keywords per ID, and apply a HAVING clause to select only those ID values with a count equal to the number of rows in the CRITERIA table.

回答2:

Instead of resorting to using a recursive CTE (and I don't know if using a recursive CTE will actually solve your problem nor if it would perform), I propose the following solution:

WITH tokens AS (
    SELECT COUNT(*) OVER () tokencount, token 
    FROM fts_tokenize('FORD 2010 BLACK')
)
SELECT id
FROM (
    SELECT DISTINCT tokencount, token, id
    FROM tokens t
    INNER JOIN fts_keywords k
        ON k.KEYWORD STARTING WITH t.token
)
GROUP BY id
HAVING MAX(tokencount) = count(*)

This will track the number of tokens (not keywords!) matched and only output those ids where the number of matched tokens is equal to the number of expected tokens.

Tracking the number of tokens and not keywords is important given your need to use STARTING (STARTING WITH) as that could match multiple keywords to a single token which should be counted only once.

Be aware, this solution does assume that fts_tokenize will only output a token once, otherwise you'll need to modify the tokens CTE to

WITH tokens AS (
    SELECT COUNT(*) OVER () tokencount, token
    FROM (
        SELECT DISTINCT token
        FROM fts_tokenize('FORD 2010 BLACK')
    ) a
),

回答3:

I think this is a simple case of double negation (I'm rephrasing your question to be that there should be no token that is not the beginning of a keyword), no need for a cte:

SELECT DISTINCT K.ID
FROM FTS_TOKENIZE ('FORD 2010 BLACK') FT
JOIN FTS_KEYWORDS K ON K.KEYWORD STARTING FT.TOKENS
WHERE NOT EXISTS(SELECT *
                 FROM FTS_TOKENIZE('FORD 2010 BLACK') FT2
                 WHERE NOT EXISTS(SELECT *
                                  FROM FTS_KEYWORDS K2
                                  WHERE K2.KEYWORD STARTING FT2.TOKENS
                                    AND K.ID = K2.ID))

HTH, Set

回答4:

You can do this by building prefixed list. As prefix i have used ASCII_CHAR(5)

SELECT 
  K.ID, COUNT(*) 
FROM FTS_KEYWORDS K
WHERE
  (SELECT ASCII_CHAR(5) || LIST(T.TOKEN, ASCII_CHAR(5)) || ASCII_CHAR(5) FROM FTS_TOKENIZE('FORD 2010 BLACK') T)
  LIKE '%' || ASCII_CHAR(5) || K.KEYWORD || ASCII_CHAR(5) || '%'
GROUP BY K.ID
HAVING COUNT(*)=(SELECT COUNT(*) FROM FTS_TOKENIZE('FORD 2010 BLACK') TX)

this should be faster (lower fetches), but you must test this in your environment.

You can speed this up also by removing FTS_TOKENIZE at all and instead of 'FORD 2010 BLACK' you simply do

SELECT 
  K.ID, COUNT(*) 
FROM FTS_KEYWORDS K
WHERE
  ASCII_CHAR(5) || 'FORD' || ASCII_CHAR(5) || '2010' || ASCII_CHAR(5) || 'BLACK' || ASCII_CHAR(5) 
  LIKE '%' || ASCII_CHAR(5) || K.KEYWORD || ASCII_CHAR(5) || '%'
GROUP BY K.ID
HAVING COUNT(*)=3

but i do not know your real case especially how this string is build to pass to FTS_TOKENIZE

UPDATE1 Not the answer to your question but you can optimize your current query by:

SELECT
    DISTINCT K1.ID
FROM
    FTS_KEYWORDS K1
    INNER JOIN FTS_KEYWORDS K2 ON K2.ID = K1.ID AND K2.KEYWORD STARTING 'FORD'
    INNER JOIN FTS_KEYWORDS K3 ON K3.ID = K2.ID AND K3.KEYWORD STARTING '2010'
WHERE
    K1.KEYWORD STARTING 'BLACK'

来源：https://stackoverflow.com/questions/56220744/recursive-relationship-on-dictionary-table

标签

sql

firebird

firebird-3.0