Why do these join differently based on size?

前端未结

关注

 1  1637

孤街浪徒 2021-01-21 02:21

In Postgresql, if you unnest two arrays of the same size, they line up each value from one array with one from the other, but if the two arrays are not the same size, it joins e

1条回答

走了就别回头了 (楼主)

2021-01-21 02:30
Support for set-returning functions in SELECT is a PostgreSQL extension, and an IMO very weird one. It's broadly considered deprecated and best avoided where possible.

Avoid using SRF-in-SELECT where possible

Now that LATERAL is supported in 9.3, one of the two main uses is gone. It used to be necessary to use a set-returning function in SELECT if you wanted to use the output of one SRF as the input to another; that is no longer needed with LATERAL.

The other use will be replaced in 9.4, when WITH ORDINALITY is added, allowing you to preserve the output ordering of a set-returning function. That's currently the main remaining use: to do things like zip the output of two SRFs into a rowset of matched value pairs. WITH ORDINALITY is most anticipated for unnest, but works with any other SRF.

Why the weird output?

The logic that PostgreSQL is using here (for whatever IMO insane reason it was originally introduced in ancient history) is: whenever either function produces output, emit a row. If only one function has produced output, scan the other one's output again to get the rows required. If neither produces output, stop emitting rows.

It's easier to see with generate_series.
```
regress=> SELECT generate_series(1,2), generate_series(1,2);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
(2 rows)

regress=> SELECT generate_series(1,2), generate_series(1,3);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
               1 |               3
               2 |               1
               1 |               2
               2 |               3
(6 rows)

regress=> SELECT generate_series(1,2), generate_series(1,4);
 generate_series | generate_series 
-----------------+-----------------
               1 |               1
               2 |               2
               1 |               3
               2 |               4
(4 rows)
```
In the majority of cases what you really want is a simple cross join of the two, which is a lot saner.
```
regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,2) b;
 a | b 
---+---
 1 | 1
 1 | 2
 2 | 1
 2 | 2
(4 rows)

regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,3) b;
 a | b 
---+---
 1 | 1
 1 | 2
 1 | 3
 2 | 1
 2 | 2
 2 | 3
(6 rows)

regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,4) b;
 a | b 
---+---
 1 | 1
 1 | 2
 1 | 3
 1 | 4
 2 | 1
 2 | 2
 2 | 3
 2 | 4
(8 rows)
```
The main exception is currently for when you want to run multiple functions in lock-step, pairwise (like a zip), which you cannot currently do with joins.

WITH ORDINALITY

This will be improved in 9.4 with WITH ORDINALITY, a d while it'll be a bit less efficient than a multiple SRF scan in SELECT (unless optimizer improvements are added) it'll be a lot saner.

Say you wanted to pair up 1..3 and 10..40 with nulls for excess elements. Using with ordinality that'd be (PostgreSQL 9.4 only):
```
regress=# SELECT aval, bval 
           FROM generate_series(1,3) WITH ORDINALITY a(aval,apos) 
           RIGHT OUTER JOIN generate_series(1,4) WITH ORDINALITY b(bval, bpos) 
           ON (apos=bpos);

 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
      |    4
(4 rows)
```
wheras the srf-in-from would instead return:
```
regress=# SELECT generate_series(1,3) aval, generate_series(1,4) bval;
 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
    1 |    4
    2 |    1
    3 |    2
    1 |    3
    2 |    4
    3 |    1
    1 |    2
    2 |    3
    3 |    4
(12 rows)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Why do these join differently based on size?

Avoid using SRF-in-`SELECT` where possible

Why the weird output?

`WITH ORDINALITY`

Why do these join differently based on size?

Avoid using SRF-in-SELECT where possible

Why the weird output?

WITH ORDINALITY

Avoid using SRF-in-`SELECT` where possible

`WITH ORDINALITY`