What is semi-join in database?

前端 未结 3 481
梦毁少年i
梦毁少年i 2020-12-05 00:38

I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied w

相关标签:
3条回答
  • 2020-12-05 00:52

    As I understand, a semi join is a left join or right join:

    What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

    So the difference between a left (semi) join and a "conventional" join is that you only retrieve the data of the left table (where you have a match on your join condition). Whereas with a full (outer) join (I think thats what you mean by conventional join), you retrieve the data of both tables where your condition matches.

    0 讨论(0)
  • 2020-12-05 00:58

    Simple example. Let's select students with grades using left outer join:

    SELECT DISTINCT s.id
    FROM  students s
          LEFT JOIN grades g ON g.student_id = s.id
    WHERE g.student_id IS NOT NULL
    

    Now the same with left semi-join:

    SELECT s.id
    FROM  students s
    WHERE EXISTS (SELECT 1 FROM grades g
                  WHERE g.student_id = s.id)
    

    The latter is much more efficient.

    0 讨论(0)
  • 2020-12-05 01:07

    As far as I know SQL dialects that support SEMIJOIN/ANTISEMI are U-SQL/Cloudera Impala.

    SEMIJOIN:

    Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.

    More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:

    “Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.

    -- IN
    SELECT *
    FROM Employee
    WHERE DeptName IN (
      SELECT DeptName
      FROM Dept
    )
    
    -- EXISTS
    SELECT *
    FROM Employee
    WHERE EXISTS (
      SELECT 1
      FROM Dept
      WHERE Employee.DeptName = Dept.DeptName
    )
    

    EDIT:

    Another dialect that supports SEMI/ANTISEMI join is KQL:

    kind=leftsemi (or kind=rightsemi)

    Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.

    let t1 = datatable(key:long, value:string)  
    [1, "a",  
    2, "b",
    3, "c"];
    let t2 = datatable(key:long)
    [1,3];
    t1 | join kind=leftsemi (t2) on key
    

    demo

    Output:

    key  value
    1    a
    3    c
    
    0 讨论(0)
提交回复
热议问题