What is a good approach in MS SQL Server 2008 to join on a “best” match?

问题

In essence I want to pick the best match of a prefix from the "Rate" table based on the TelephoneNumber field in the "Call" table. Given the example data below, '0123456789' would best match the prefix '012' whilst '0100000000' would best match the prefix '01'.

I've included some DML with some more examples of correct matches in the SQL comments.

There will be circa 70,000 rows in the rate table and the call table will have around 20 million rows. But there will be a restriction on the Select from the Call table based on a dateTime column so actually the query will only need to run over 0.5 million call rows.

The prefix in the Rate table can be up to 16 characters long.

I have no idea how to approach this in SQL, I'm currently thinking of writing a C# SQLCLR function to do it. Has anyone done anything similar? I'd appreciate any advice you have.

Example Data

Call table:

Id  TelephoneNumber
1   0123456789
2   0100000000
3   0200000000
4   0780000000
5   0784000000
6   0987654321

Rate table:

Prefix Scale
       1
01   1.1
012 1.2
02   2
078    3
0784   3.1

DML

create table Rate
(
    Prefix nvarchar(16) not null,
    Scale float not null
)

create table [Call]
(
    Id bigint not null,
    TelephoneNumber nvarchar(16) not null
)

insert into Rate (Prefix, Scale) values ('', 1)
insert into Rate (Prefix, Scale) values ('01', 1.1)
insert into Rate (Prefix, Scale) values ('012', 1.2)
insert into Rate (Prefix, Scale) values ('02', 2)
insert into Rate (Prefix, Scale) values ('078', 3)
insert into Rate (Prefix, Scale) values ('0784', 3.1)

insert into [Call] (Id, TelephoneNumber) values (1, '0123456789') --match 1.2
insert into [Call] (Id, TelephoneNumber) values (2, '0100000000') --match 1.1
insert into [Call] (Id, TelephoneNumber) values (3, '0200000000') --match 2
insert into [Call] (Id, TelephoneNumber) values (4, '0780000000') --match 3
insert into [Call] (Id, TelephoneNumber) values (5, '0784000000') --match 3.1
insert into [Call] (Id, TelephoneNumber) values (6, '0987654321') --match 1

Note: The last one '0987654321' matches the blank string because there are no better matches.

回答1:

Since this is based on partial matching, a subselect would be the only viable option (unless, like LukeH assumes, every call is unique)

select
    c.Id,
    c.TelephoneNumber,
    (select top 1 
         Scale 

         from Rate r 

         where c.TelephoneNumber like r.Prefix + '%' order by Scale desc
    ) as Scale

from Call c

回答2:

SELECT t.Id, t.TelephoneNumber, t.Prefix, t.Scale
FROM
(
    SELECT *, ROW_NUMBER() OVER
              (
                  PARTITION BY c.TelephoneNumber
                  ORDER BY r.Scale DESC
              ) AS RowNumber
    FROM [call] AS c
        INNER JOIN [rate] AS r
            ON c.TelephoneNumber LIKE r.Prefix + '%'
) AS t
WHERE t.RowNumber = 1
ORDER BY t.Id

回答3:

Try this one:

select Prefix, min(c.TelephoneNumber)
from Rate r
left outer join Call c on c.TelephoneNumber like left(Prefix + '0000000000', 10) 
    or c.TelephoneNumber like Prefix + '%'
group by Prefix

回答4:

You can use a left join to try to find a "better" match, and then eliminate such matches in your where clause. e.g.:

select
 *
from
 Call c
  inner join
 Rate r
  on
   r.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r.Prefix))
  left join
 Rate r_anti
  on
   r_anti.Prefix = SUBSTRING(c.TelephoneNumber,1,LEN(r_anti.Prefix)) and
   LEN(r_anti.Prefix) > LEN(r.Prefix)
where
 r_anti.Prefix is null

来源：https://stackoverflow.com/questions/3709323/what-is-a-good-approach-in-ms-sql-server-2008-to-join-on-a-best-match

标签

tsql

join

sqlclr