问题
So imagine that you have a table of Products (ID int, Name nvarchar(200))
, and two other tables, ProductsCategories (ProductID int, CategoryID int)
and InvoiceProducts (InvoiceID int, ProductID int)
.
I need to write a query to produce a set of products that match a given set of invoice ids and category ids such that the list of products match all the specified categories and all the specified invoices, without falling back to dynamic SQL. Imagine I need to find a list of products that are in both categories 1 and 2 and in invoices 3 and 4.
As a start, I've written a stored-procedure that accept the category ids and invoice ids as strings, and parse them into tables:
CREATE PROCEDURE dbo.SearchProducts (@categories varchar(max), @invoices varchar(max))
AS BEGIN
with catids as (select cast([value] as int) from dbo.split(@categories, ' ')),
invoiceids as (select cast([value] as int) from dbo.split(@invoices, ' '))
select * from products --- insert awesomeness here
END
The different solutions I've come up with look awful, and perform worse. The best thing I've found is to generate a view comprised of left joins of all the criteria, but that seems very expensive and doesn't solve the issue of matching all of the different keys specified.
Update: This is an example query I wrote that yields the expected results. Am I missing any optimization opportunities? Like magical unicorn matrix operations by ninjas?
with catids as (select distinct cast([value] as int) [value] from dbo.split(@categories, ' ')),
invoiceids as (select distinct cast([value] as int) [value] from dbo.split(@invoices, ' '))
select pc.ProductID from ProductsCategories pc (nolock)
inner join catids c on c.value = pc.CategoryID
group by pc.ProductID
having COUNT(*) = (select COUNT(*) from catids)
intersect
select ip.ProductID from InvoiceProducts ip (nolock)
inner join invoiceids i on i.value = ip.InvoiceID
group by ip.ProductID
having COUNT(*) = (select COUNT(*) from invoiceids)
回答1:
Provided that you have unique indices on both (ProductID, CategoryID)
and (ProductID, InvoiceID)
:
SELECT ProductID
FROM (
SELECT ProductID
FROM ProductInvoice
WHERE InvoiceID IN (1, 2)
UNION ALL
SELECT ProductID
FROM ProductCategory pc
WHERE CategoryID IN (3, 4)
) q
GROUP BY
ProductID
HAVING COUNT(*) = 4
or, if your values are passed in CSV
strings:
WITH catids(value) AS
(
SELECT DISTINCT CAST([value] AS INT)
FROM dbo.split(@categories, ' '))
),
(
SELECT DISTINCT CAST([value] AS INT)
FROM dbo.split(@invoices, ' '))
)
SELECT ProductID
FROM (
SELECT ProductID
FROM ProductInvoice
WHERE InvoiceID IN
(
SELECT value
FROM invoiceids
)
UNION ALL
SELECT ProductID
FROM ProductCategory pc
WHERE CategoryID IN
(
SELECT value
FROM catids
)
) q
GROUP BY
ProductID
HAVING COUNT(*) =
(
SELECT COUNT(*)
FROM catids
) +
(
SELECT COUNT(*)
FROM invoiceids
)
Note that in SQL Server 2008
you can pass table-valued parameters into the stored procedures.
回答2:
I'd start with something like this, utilizing your tabled ID values from the parameters. Temp tables can help with subquery speed.
select p.*
from
(
select pc.*
from catids c
inner join ProductsCategories pc
on pc.CategoryID = c.value
) catMatch
inner join
(
select pin.*
from invoiceids i
inner join ProductsInvoices pin
on pin.InvoiceID = i.value
) invMatch
on invMatch.ProductID = catMatch.ProductID
inner join Products p
on p.ID = invMatch.ProductID
回答3:
ProductCategories should have a clustered index on (CategoryId, ProductId) and InvoiceProducts should have one on (InvoiceId, ProductId) optimally. This will allow finding product ids given the CategoryId and InvoiceId by using the data in the clustered indexes only.
You could use a function to return a table of ints given a string. Google "CsvToInt" and click on the first link from SqlTeam to see the code.
Then you could:
SELECT *
FROM Products
WHERE ID IN (SELECT DISTINCT ProductId
FROM ProductCategories
WHERE CategoryId in dbo.CsvToInt(@categories)
) AND ID IN (SELECT DISTINCT ProductId
FROM InvoiceProducts
WHERE InvoiceId in dbo.CsvToInt(@invoices)
)
回答4:
How about a recursive CTE?
First add row numbers to the criteria tables, then some pseudo SQL if you will:
;WITH cte AS(
Base case: Select productid, criteria from products left join criteria where row_number = 1 if it matches criteria from both row 1s or one is null.
UNION ALL
Recursive case: Select n+1 criteria row from products left join criteria where row_number = cte.row_number + 1 AND matches criteria from both row_number + 1 or one or the other (but not both) is null
)
SELECT *
WHERE criteria = maximum id from criteria table.
This will give you a way of performing AND on multiple criteria, and should perform well.
Does this make any sense at all? I've done some pretty cool fast stuff with CTEs lately, and can elaborate if necessary.
Removed cte code because it was wrong, and not really worth fixing having a much better solution out there.
回答5:
Pass them as XML parameter, store them to a temp table and join.
来源:https://stackoverflow.com/questions/4019343/complicated-sql-query-finding-items-matching-multiple-different-foreign-keys