Why do these join differently based on size?

前端 未结 1 1635
孤街浪徒
孤街浪徒 2021-01-21 02:21

In Postgresql, if you unnest two arrays of the same size, they line up each value from one array with one from the other, but if the two arrays are not the same size, it joins e

1条回答
  •  走了就别回头了
    2021-01-21 02:30

    Support for set-returning functions in SELECT is a PostgreSQL extension, and an IMO very weird one. It's broadly considered deprecated and best avoided where possible.

    Avoid using SRF-in-SELECT where possible

    Now that LATERAL is supported in 9.3, one of the two main uses is gone. It used to be necessary to use a set-returning function in SELECT if you wanted to use the output of one SRF as the input to another; that is no longer needed with LATERAL.

    The other use will be replaced in 9.4, when WITH ORDINALITY is added, allowing you to preserve the output ordering of a set-returning function. That's currently the main remaining use: to do things like zip the output of two SRFs into a rowset of matched value pairs. WITH ORDINALITY is most anticipated for unnest, but works with any other SRF.

    Why the weird output?

    The logic that PostgreSQL is using here (for whatever IMO insane reason it was originally introduced in ancient history) is: whenever either function produces output, emit a row. If only one function has produced output, scan the other one's output again to get the rows required. If neither produces output, stop emitting rows.

    It's easier to see with generate_series.

    regress=> SELECT generate_series(1,2), generate_series(1,2);
     generate_series | generate_series 
    -----------------+-----------------
                   1 |               1
                   2 |               2
    (2 rows)
    
    regress=> SELECT generate_series(1,2), generate_series(1,3);
     generate_series | generate_series 
    -----------------+-----------------
                   1 |               1
                   2 |               2
                   1 |               3
                   2 |               1
                   1 |               2
                   2 |               3
    (6 rows)
    
    regress=> SELECT generate_series(1,2), generate_series(1,4);
     generate_series | generate_series 
    -----------------+-----------------
                   1 |               1
                   2 |               2
                   1 |               3
                   2 |               4
    (4 rows)
    

    In the majority of cases what you really want is a simple cross join of the two, which is a lot saner.

    regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,2) b;
     a | b 
    ---+---
     1 | 1
     1 | 2
     2 | 1
     2 | 2
    (4 rows)
    
    regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,3) b;
     a | b 
    ---+---
     1 | 1
     1 | 2
     1 | 3
     2 | 1
     2 | 2
     2 | 3
    (6 rows)
    
    regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,4) b;
     a | b 
    ---+---
     1 | 1
     1 | 2
     1 | 3
     1 | 4
     2 | 1
     2 | 2
     2 | 3
     2 | 4
    (8 rows)
    

    The main exception is currently for when you want to run multiple functions in lock-step, pairwise (like a zip), which you cannot currently do with joins.

    WITH ORDINALITY

    This will be improved in 9.4 with WITH ORDINALITY, a d while it'll be a bit less efficient than a multiple SRF scan in SELECT (unless optimizer improvements are added) it'll be a lot saner.

    Say you wanted to pair up 1..3 and 10..40 with nulls for excess elements. Using with ordinality that'd be (PostgreSQL 9.4 only):

    regress=# SELECT aval, bval 
               FROM generate_series(1,3) WITH ORDINALITY a(aval,apos) 
               RIGHT OUTER JOIN generate_series(1,4) WITH ORDINALITY b(bval, bpos) 
               ON (apos=bpos);
    
     aval | bval 
    ------+------
        1 |    1
        2 |    2
        3 |    3
          |    4
    (4 rows)
    

    wheras the srf-in-from would instead return:

    regress=# SELECT generate_series(1,3) aval, generate_series(1,4) bval;
     aval | bval 
    ------+------
        1 |    1
        2 |    2
        3 |    3
        1 |    4
        2 |    1
        3 |    2
        1 |    3
        2 |    4
        3 |    1
        1 |    2
        2 |    3
        3 |    4
    (12 rows)
    

    0 讨论(0)
提交回复
热议问题