Hive does not support non equi joins: The common work around is to move the join condition to the where clause, which work fine when you want an inner join. but what about a
Hive 0.10 supports cross joins, so you could handle all your "theta join" (non-equijoin) conditions in the WHERE clause.
Why not use a WHERE clause that allows for NULL cases separately?
SELECT * FROM OrderLineItem li
LEFT OUTER JOIN ProductPrice p
ON p.ProductID=li.ProductID
WHERE ( StartDate IS NULL OR OrderDate BETWEEN startDate AND EndDate);
That should take care of it - if the left join matches it'll use the date logic, if it doesn't it'll keep the NULL values intact as a left join should.
Not sure if you can avoid using a double join:
SELECT *
FROM OrderLineItem li
LEFT OUTER JOIN (
SELECT p.*
FROM ProductPrice p
JOIN OrderLineItem li
ON p.ProductID=li.ProductID
WHERE OrderDate BETWEEN StartDate AND EndDate ) p
ON p.ProductId = li.ProductID
WHERE StartDate IS NULL OR
OrderDate BETWEEN StartDate AND EndDate;
This way if there is a match and StartDate is not null, there has to be a valid start/end date match.
Create a copy of the left table with added serial row numbers:
CREATE TABLE OrderLineItem_serial AS
SELECT ROW_NUMBER() OVER() AS serial, * FROM OrderLineItem;
Remark: This may work better for some tables formats (must be WITHOUT COMPRESSION):
CONCAT(INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE) AS serial
Do an inner join:
CREATE TABLE OrderLineItem_inner AS
SELECT * FROM OrderLineItem_serial li JOIN ProductPrice p
on p.ProductID = li.ProductID WHERE OrderDate BETWEEN startDate AND EndDate;
Left join by serial:
SELECT * FROM OrderLineItem_serial li
LEFT OUTER JOIN OrderLineItem_inner i on li.serial = i.serial;