I am doing a search in two text fields called Subject
and Text
for a specific keyword. To do this I use the LIKE
statement. I have encount
// escape $keyword for mysql
$keyword = strtolower('Keyword');
// now build the query
$query = <<<SQL
SELECT *,
((LENGTH(`Subject`) - LENGTH(REPLACE(LOWER(`Subject`), '{$keyword}', ''))) / LENGTH('{$keyword}')) AS `CountInSubject`,
((LENGTH(`Text`) - LENGTH(REPLACE(LOWER(`Text`), '{$keyword}', ''))) / LENGTH('{$keyword}')) AS `CountInText`
FROM `News`
WHERE (`Text` LIKE '%{$keyword}%' OR `Subject` LIKE '%{$keyword}%')
ORDER BY (`CountInSubject` + `CountInText`) DESC;
SQL;
Returns number of occurrences in each field and sorts by that.
The 'keyword'
needs to be lower cased for this to work. I don't think it's really fast, performance wise as it needs to lower-case fields and there's no case-insensitive search in MySQL afaik.
You could index each news
item (subject
and text
) by words and store in another table with news_id
and occurrence count and then match against that.
Below query can give you the no.of occurrences of string appears in both columns i.e text and subject and will sort results by the criteria but this will not be a good solution performance wise its better to sort the results in your application code level
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'Keyword', ''))) / LENGTH('Keyword')
+
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'Keyword', ''))) / LENGTH('Keyword') `occurences`
FROM
`Table`
WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%')
ORDER BY `occurences` DESC
Suggested by @lserni a more cleaner way of calculation of occurrences
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'test', ''))) / LENGTH('test') `appears_in_text`,
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'test', ''))) / LENGTH('test') `appears_in_subject`,
(LENGTH(CONCAT(`Text`,' ',`Subject`)) - LENGTH(REPLACE(CONCAT(`Text`,' ',`Subject`), 'test', ''))) / LENGTH('test') `occurences`
FROM
`Table1`
WHERE (TEXT LIKE '%test%' OR SUBJECT LIKE '%test%')
ORDER BY `occurences` DESC
You want SUM
instead. Count will count how many records have non-null values, which means ALL matches and NON-matches will be counted.
SELECT *, SUM(Text LIKE '%Keyword') AS total_matches
...
ORDER BY total_matches
SUM() will count up how many boolean true results the LIKE produces, which will be typecast to integers, so you get a result like 1+1+1+0+1 = 4, instead of the 5 non-nulls count.