Hive: work around for non equi left join

前端 未结 4 1704
耶瑟儿~
耶瑟儿~ 2020-12-18 09:57

Hive does not support non equi joins: The common work around is to move the join condition to the where clause, which work fine when you want an inner join. but what about a

相关标签:
4条回答
  • 2020-12-18 10:35

    Hive 0.10 supports cross joins, so you could handle all your "theta join" (non-equijoin) conditions in the WHERE clause.

    0 讨论(0)
  • 2020-12-18 10:42

    Why not use a WHERE clause that allows for NULL cases separately?

    SELECT * FROM OrderLineItem li 
    LEFT OUTER JOIN  ProductPrice p 
    ON p.ProductID=li.ProductID 
    WHERE ( StartDate IS NULL OR OrderDate BETWEEN startDate AND EndDate);
    

    That should take care of it - if the left join matches it'll use the date logic, if it doesn't it'll keep the NULL values intact as a left join should.

    0 讨论(0)
  • 2020-12-18 10:45

    Not sure if you can avoid using a double join:

    SELECT * 
    FROM OrderLineItem li 
    LEFT OUTER JOIN  (
      SELECT p.*
      FROM ProductPrice p
      JOIN OrderLineItem li 
      ON p.ProductID=li.ProductID 
      WHERE OrderDate BETWEEN StartDate AND EndDate ) p
    ON p.ProductId = li.ProductID
    WHERE StartDate IS NULL OR 
      OrderDate BETWEEN StartDate AND EndDate;
    

    This way if there is a match and StartDate is not null, there has to be a valid start/end date match.

    0 讨论(0)
  • 2020-12-18 10:52
    1. Create a copy of the left table with added serial row numbers:

      CREATE TABLE OrderLineItem_serial AS
      SELECT ROW_NUMBER() OVER() AS serial, * FROM OrderLineItem;
      

      Remark: This may work better for some tables formats (must be WITHOUT COMPRESSION):

      CONCAT(INPUT__FILE__NAME, BLOCK__OFFSET__INSIDE__FILE) AS serial
      
    2. Do an inner join:

      CREATE TABLE OrderLineItem_inner AS
      SELECT * FROM OrderLineItem_serial li JOIN ProductPrice p
      on p.ProductID = li.ProductID WHERE OrderDate BETWEEN startDate AND EndDate;
      
    3. Left join by serial:

      SELECT * FROM OrderLineItem_serial li
      LEFT OUTER JOIN OrderLineItem_inner i on li.serial = i.serial;
      
    0 讨论(0)
提交回复
热议问题