How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?

前端 未结 9 1684
灰色年华
灰色年华 2020-11-28 02:55

Suppose I have two columns, keywords and content. I have a fulltext index across both. I want a row with foo in the keywords to have more relevance than a row with foo in th

相关标签:
9条回答
  • 2020-11-28 03:08

    Create three full text indexes

    • a) one on the keyword column
    • b) one on the content column
    • c) one on both keyword and content column

    Then, your query:

    SELECT id, keyword, content,
      MATCH (keyword) AGAINST ('watermelon') AS rel1,
      MATCH (content) AGAINST ('watermelon') AS rel2
    FROM table
    WHERE MATCH (keyword,content) AGAINST ('watermelon')
    ORDER BY (rel1*1.5)+(rel2) DESC
    

    The point is that rel1 gives you the relevance of your query just in the keyword column (because you created the index only on that column). rel2 does the same, but for the content column. You can now add these two relevance scores together applying any weighting you like.

    However, you aren't using either of these two indexes for the actual search. For that, you use your third index, which is on both columns.

    The index on (keyword,content) controls your recall. Aka, what is returned.

    The two separate indexes (one on keyword only, one on content only) control your relevance. And you can apply your own weighting criteria here.

    Note that you can use any number of different indexes (or, vary the indexes and weightings you use at query time based on other factors perhaps ... only search on keyword if the query contains a stop word ... decrease the weighting bias for keywords if the query contains more than 3 words ... etc).

    Each index does use up disk space, so more indexes, more disk. And in turn, higher memory footprint for mysql. Also, inserts will take longer, as you have more indexes to update.

    You should benchmark performance (being careful to turn off the mysql query cache for benchmarking else your results will be skewed) for your situation. This isn't google grade efficient, but it is pretty easy and "out of the box" and it's almost certainly a lot lot better than your use of "like" in the queries.

    I find it works really well.

    0 讨论(0)
  • 2020-11-28 03:12

    I did this a few years ago, but without the full text index. I don't have the code handy (former employer), but I remember the technique well.

    In a nutshell, I selected a "weight" from each column. For example:

    select table.id, keyword_relevance + content_relevance as relevance from table
       left join
          (select id, 1 as keyword_relevance from table_name where keyword match) a
       on table.id = a.id
       left join
          (select id, 0.75 as content_relevance from table_name where content match) b
       on table.id = b.id
    

    Please forrgive any shoddy SQL here, it's been a few years since I needed to write any, and I'm doing this off the top of my head...

    Hope this helps!

    J.Js

    0 讨论(0)
  • 2020-11-28 03:14

    If the metric is just that all the keyword matches are more "valuable" than all the content matches then you can just use a union with row counts. Something along these lines.

    select *
    from (
       select row_number() over(order by blahblah) as row, t.*
       from thetable t
       where keyword match
    
       union
    
       select row_number() over(order by blahblah) + @@rowcount + 1 as row, t.*
       from thetable t
       where content match
    )
    order by row
    

    For anything more complicated than that, where you want to apply an actual weight to every row, I don't know how to help.

    0 讨论(0)
  • 2020-11-28 03:18

    Well, that depends on what do you exactly mean with:

    I want a row with foo in the keywords to have more relevance than a row with foo in the content.

    If you mean that a row with foo in the keywords should come before any row with foo in the content, then I will do two separate queries, one for the keywords and then (possibly lazily, only if it's requested) the other one on the content.

    0 讨论(0)
  • 2020-11-28 03:19

    In Boolean mode, MySQL supports the ">" and "<" operator to change a word's contribution to the relevance value that is assigned to a row.

    I wonder if something like this would work?

    SELECT *, 
    MATCH (Keywords) AGAINST ('>watermelon' IN BOOLEAN MODE) AS relStrong, 
    MATCH (Title,Keywords,Content) AGAINST ('<watermelon' IN BOOLEAN MODE) AS relWeak 
    FROM about_data  
    WHERE MATCH(Title, Keywords, Content) AGAINST ('watermelon' IN BOOLEAN MODE) 
    ORDER by (relStrong+relWeak) desc
    
    0 讨论(0)
  • 2020-11-28 03:26

    Actually, using a case statement to make a pair of flags might be a better solution:

    select 
    ...
    , case when keyword like '%' + @input + '%' then 1 else 0 end as keywordmatch
    , case when content like '%' + @input + '%' then 1 else 0 end as contentmatch
    -- or whatever check you use for the matching
    from 
       ... 
       and here the rest of your usual matching query
       ... 
    order by keywordmatch desc, contentmatch desc
    

    Again, this is only if all keyword matches rank higher than all the content-only matches. I also made the assumption that a match in both keyword and content is the highest rank.

    0 讨论(0)
提交回复
热议问题