问题
I'm working on a poor, but ok for us, full-text search using only PSQL in Firebird. I'll try to simplify as much as possible by focusing on my problem:
Summing up, this a dictionary table:
SELECT * FROM FTS_KEYWORDS
ID | KEYWORD
----+-----------
1 | 'FORD'
1 | 'MUSTANG'
1 | '2010'
2 | 'FORD'
2 | 'FUSION'
2 | 'TURBO'
2 | '2010'
3 | 'FORD'
3 | 'RANGER'
3 | 'TURBO'
3 | '2010'
3 | 'BLACK'
There is too a FTS_TOKENIZE()
procedure to get the words from the whole strings
Case 1: User search with 1 keyword
SELECT TOKENS FROM FTS_TOKENIZE('FORD')
TOKENS
-------------
'FORD'
This would then be the SQL required to get the correct results:
:TOKEN_1 = 'FORD'
SELECT DISTINCT ID
FROM FTS_KEYWORDS
WHERE (KEYWORD STARTING :TOKEN_1)
ID
-----
1
2
3
Case 2: User search with 3 keywords
SELECT TOKENS FROM FTS_TOKENIZE('FORD 2010 BLACK')
TOKENS
-------------
'FORD'
'2010'
'BLACK'
So, SQL to retrieve the correct values:
:TOKEN_1 = 'FORD'
:TOKEN_2 = '2010'
:TOKEN_3 = 'BLACK'
SELECT DISTINCT K1.ID
FROM FTS_KEYWORDS K1
WHERE (K1.KEYWORD STARTING :TOKEN_1)
AND (K1.ID IN (SELECT DISTINCT K2.ID
FROM FTS_KEYWORDS K2
WHERE (K2.KEYWORD STARTING :TOKEN_2)))
AND (K2.ID IN (SELECT DISTINCT K3.ID
FROM FTS_KEYWORDS K3
WHERE (K3.KEYWORD STARTING :TOKEN_3)))
ID
-----
3
ID 3
is the only ID
that has all the keywords matching the search.
The SQL to retrieve values is a recursive nested by the tokens amount user query search.
Currently, in a procedure FTS_SEARCH()
, I build a SQL string and use then in an EXECUTE STATEMENT
way, but I do not think this is ideal.
I think this can be done with recursive Common Table Expressions (“WITH ... AS ... SELECT”), but I was not able to do it, because, based on the current examples available, it requires a table with Parent_ID
and does not accept input parameters, which is not my case.
My question is: Is there a way to do this search in a recursive way using CTE or other SQL trick?
回答1:
Instead of using a recursive CTE, you could put your list of tokens into a table (CRITERIA
), join that table with FTS_KEYWORDS
on KEYWORD
, group by ID
and count the number of keywords per ID
, and apply a HAVING clause to select only those ID
values with a count equal to the number of rows in the CRITERIA
table.
回答2:
Instead of resorting to using a recursive CTE (and I don't know if using a recursive CTE will actually solve your problem nor if it would perform), I propose the following solution:
WITH tokens AS (
SELECT COUNT(*) OVER () tokencount, token
FROM fts_tokenize('FORD 2010 BLACK')
)
SELECT id
FROM (
SELECT DISTINCT tokencount, token, id
FROM tokens t
INNER JOIN fts_keywords k
ON k.KEYWORD STARTING WITH t.token
)
GROUP BY id
HAVING MAX(tokencount) = count(*)
This will track the number of tokens (not keywords!) matched and only output those ids where the number of matched tokens is equal to the number of expected tokens.
Tracking the number of tokens and not keywords is important given your need to use STARTING
(STARTING WITH
) as that could match multiple keywords to a single token which should be counted only once.
Be aware, this solution does assume that fts_tokenize
will only output a token once, otherwise you'll need to modify the tokens
CTE to
WITH tokens AS (
SELECT COUNT(*) OVER () tokencount, token
FROM (
SELECT DISTINCT token
FROM fts_tokenize('FORD 2010 BLACK')
) a
),
回答3:
I think this is a simple case of double negation (I'm rephrasing your question to be that there should be no token that is not the beginning of a keyword), no need for a cte:
SELECT DISTINCT K.ID FROM FTS_TOKENIZE ('FORD 2010 BLACK') FT JOIN FTS_KEYWORDS K ON K.KEYWORD STARTING FT.TOKENS WHERE NOT EXISTS(SELECT * FROM FTS_TOKENIZE('FORD 2010 BLACK') FT2 WHERE NOT EXISTS(SELECT * FROM FTS_KEYWORDS K2 WHERE K2.KEYWORD STARTING FT2.TOKENS AND K.ID = K2.ID))
HTH, Set
回答4:
You can do this by building prefixed list.
As prefix i have used ASCII_CHAR(5)
SELECT
K.ID, COUNT(*)
FROM FTS_KEYWORDS K
WHERE
(SELECT ASCII_CHAR(5) || LIST(T.TOKEN, ASCII_CHAR(5)) || ASCII_CHAR(5) FROM FTS_TOKENIZE('FORD 2010 BLACK') T)
LIKE '%' || ASCII_CHAR(5) || K.KEYWORD || ASCII_CHAR(5) || '%'
GROUP BY K.ID
HAVING COUNT(*)=(SELECT COUNT(*) FROM FTS_TOKENIZE('FORD 2010 BLACK') TX)
this should be faster (lower fetches), but you must test this in your environment.
You can speed this up also by removing FTS_TOKENIZE
at all and instead of 'FORD 2010 BLACK'
you simply do
SELECT
K.ID, COUNT(*)
FROM FTS_KEYWORDS K
WHERE
ASCII_CHAR(5) || 'FORD' || ASCII_CHAR(5) || '2010' || ASCII_CHAR(5) || 'BLACK' || ASCII_CHAR(5)
LIKE '%' || ASCII_CHAR(5) || K.KEYWORD || ASCII_CHAR(5) || '%'
GROUP BY K.ID
HAVING COUNT(*)=3
but i do not know your real case especially how this string is build to pass to FTS_TOKENIZE
UPDATE1 Not the answer to your question but you can optimize your current query by:
SELECT
DISTINCT K1.ID
FROM
FTS_KEYWORDS K1
INNER JOIN FTS_KEYWORDS K2 ON K2.ID = K1.ID AND K2.KEYWORD STARTING 'FORD'
INNER JOIN FTS_KEYWORDS K3 ON K3.ID = K2.ID AND K3.KEYWORD STARTING '2010'
WHERE
K1.KEYWORD STARTING 'BLACK'
来源:https://stackoverflow.com/questions/56220744/recursive-relationship-on-dictionary-table