问题
I have a relation in PostgreSQL named product
which contains 2 fields: id
and quantity
, and I want to find the id
of the products with the highest quantity
. As far as I know, there are 2 ways of doing it:
SELECT id FROM product WHERE quantity >= ALL(SELECT quantity FROM product)
or
SELECT id FROM product WHERE quantity = (SELECT MAX(quantity) FROM product)
Is there any difference in their speed of execution?
回答1:
The first query fails if any row has quantity IS NULL
values (as Gordon demonstrates).
The second query only fails if all rows have quantity IS NULL
. So it should be usable in most cases. (And it's faster.)
If you need a NULL-safe query in Postgres 12 or older, consider:
SELECT id, quantity
FROM product
WHERE quantity IS NOT DISTINCT FROM (SELECT MAX(quantity) FROM product);
Or, probably faster:
SELECT id, quantity
FROM (
SELECT *, rank() OVER (ORDER BY quantity DESC NULLS LAST) AS rnk
FROM product
) sub
WHERE rnk = 1;
See:
- PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
Postgres 13 (currently beta) adds the standard SQL clause WITH TIES:
SELECT id
FROM product
ORDER BY quantity DESC NULLS LAST
FETCH FIRST 1 ROWS WITH TIES;
db<>fiddle here
Works with any amount of NULL
values.
The manual:
SQL:2008 introduced a different syntax to achieve the same result, which PostgreSQL also supports. It is:
OFFSET start { ROW | ROWS } FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
In this syntax, the
start
orcount
value is required by the standard to be a literal constant, a parameter, or a variable name; as a PostgreSQL extension, other expressions are allowed, but will generally need to be enclosed in parentheses to avoid ambiguity. Ifcount
is omitted in aFETCH
clause, it defaults to 1. TheWITH TIES
option is used to return any additional rows that tie for the last place in the result set according to theORDER BY
clause;ORDER BY
is mandatory in this case.ROW
andROWS
as well asFIRST
andNEXT
are noise words that don't influence the effects of these clauses.
Notably, WITH TIES
cannot be used with the (non-standard) short syntax LIMIT n
.
It's the fastest possible solution. Faster than either of your current queries. More important for performance: have an index on (quantity)
. Or a more specialized covering index to allow index-only scans (a bit faster, yet):
CREATE INDEX ON product (quantity DESC NULLS LAST) INCLUDE (id);
See:
- Do covering indexes in PostgreSQL help JOIN columns?
We need NULLS LAST
to keep NULL
values last in descending order. See:
- Sort by column ASC, but NULL values first?
回答2:
I tried your methods in postgres (test table distributed by id). That first method ran much slower for me. Here were my comparison results:
Method 1 above: 3.1 seconds
Method 2 above: 0.13 seconds
Method 1 was at least 10 times slower in repeated efforts. I think your method 2 is the better option, as the sub-query likely runs much faster than the sub-query in the other option.
回答3:
Your queries are NOT equivalent. The first returns no rows at all if any of the quantity
values are NULL
. The second ignores NULL
values.
Here is a db<>fiddle illustrating this.
回答4:
there is the 3rd variant
SELECT id FROM product
WHERE quantity = (SELECT quantity FROM product ORDER BY quantity DESC NULLS LAST LIMIT 1)
if the table has btree index as (quantity DESC NULLS LAST) this variant will be super-fast
来源:https://stackoverflow.com/questions/63178114/greater-than-or-equal-to-all-and-equal-to-max-speed