I am trying to figure out exactly how weighted terms work in an ISABOUT query in SQL SERVER.
Here is where I currently am:
Each query returns the following rows:
QUERY 1 (weight 1): Initial ranking
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 249
272619 156
221557 114
QUERY 2 (weight 0.8): Ranking increases, initial order is preserved
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 321
272619 201
221557 146
QUERY 3 (weight 0.2): Ranking increases, initial order is preserved
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
306342 998
272619 877
221557 692
QUERY 4 (weight 0.17): Ranking decreases, best match is now last, inverted behavior for these terms begin at 0.17
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
272619 960
221557 958
306342 802
QUERY 5 (weight 0.16): Ranking increases, best match is now second
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
272619 978
306342 935
221557 841
QUERY 6 (weight 0.01): Ranking decreases, best match is last again
SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY]
KEY RANK
221557 105
272619 77
306342 50
Best match for weight 1 has a rank of 249 and while weight goes down to 0.2 ranking of best match increases to 998. From 0.2 to 0.17 ranking decreases and from 0.16 results are inverted (the weight values that reproduce this behavior depend on terms and maybe on columns searched...)
It seems there is a point where weight means the opposite, something like "do not include this term". Do you have any explanation of this behavior? Why ranking increases when weight decreases? Why ranking decreases after some point until results are inverted and how can you predict this point?
I use a custom "word-breaker", when user searches for something creating the following query:
CONTAINSTABLE(documentParts, title,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.1),
"wordB*" weight (0.1),
"wordC*" weight (0.1),
) ')
Am I to expect big ranks for 0.1 Words?
Is the following query the same as above and am I to expect some weird behavior with the 0.1 rankings?
CONTAINSTABLE(documentParts, title, '
ISABOUT ( "wordA wordB wordC" weight (0.8) ),
OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ),
OR ISABOUT ( "wordA*" weight (0.1) ),
OR ISABOUT ( "wordB*" weight (0.1) ),
OR ISABOUT ( "wordC*" weight (0.1) ),
')
In my experience I have had the best results where the weights add up to 1.
CONTAINSTABLE(documentParts, content,
'ISABOUT (
"wordA wordB wordC" weight (0.5),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2),
"wordA*" weight (0.1),
"wordB*" weight (0.1),
"wordC*" weight (0.1)
) ')
Since the clock is ticking I end up with something like this which fetches quite good results...:
SELECT [KEY], SUM([RANK]) AS [RANK] FROM (
SELECT [KEY], ([RANK]*1)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK]
FROM CONTAINSTABLE(documentParts, content,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.4),
"wordB*" weight (0.4),
"wordC*" weight (0.4)
) ') c
WHERE c.RANK>0
UNION ALL
SELECT [KEY], ([RANK]*2)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK]
FROM CONTAINSTABLE(documents, title,
'ISABOUT (
"wordA wordB wordC" weight (0.8),
"wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6),
"wordA*" weight (0.4),
"wordB*" weight (0.4),
"wordC*" weight (0.4)
) ') c
WHERE c.RANK>0
) t
GROUP BY [KEY]
ORDER BY [RANK] DESC
I will pass it to the test team and call it a day...
来源:https://stackoverflow.com/questions/13700320/sql-server-isabout-weighted-terms