I want to find the last value of y
over an ordered partition using a query like this:
SELECT
x,
LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y AS
TLDR: The query you want is:
SELECT
x,
LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM table
Possibly followed by GROUP BY
to collapse duplicate output rows from the analytic function.
And of course, it's simpler to just use MAX
over an unordered partition if that's all you need:
SELECT
x,
MAX(y) OVER (PARTITION BY x)
FROM table
Before answering this question, here's a little background on analytic functions (a.k.a. window functions). All of the below is standard SQL and not specific to BigQuery.
First, analytic functions are not aggregation functions. Whereas aggregation functions collapse multiple input rows into a single output row, analytic functions compute exactly one output row for every input row. So you need to make sure you're thinking about what the output is for every input row.
Second, analytic functions operate over a "window" of rows that is a subset of the "partition" to which the row belongs. The partition for an input row is determined by the PARTITION BY
clause, or you can omit it if you want the partition to be the entire set of input rows. The window is given by the ROWS
clause, but if you don't specify it (and users usually don't), it defaults to either the entire partition (when no ordering is applied) or the set of rows in the partition from the first row to the current row (when an ORDER BY
is present). Note that the window can differ for each input row in a partition!
Now, back to LAST_VALUE
. Although the default window described above is reasonable in many cases (e.g., computing cumulative sums), it works spectacularly poorly with LAST_VALUE
. The LAST_VALUE
function returns the value of the last row in the window, and by default the last row in the window is the current row.
So to fix the problem, you need to explicitly specify that the window for LAST_VALUE
is the entire partition, not just the rows up to the current row. You can do so as follows:
SELECT x, LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM table
To test this out, here's an example:
SELECT
x,
FIRST_VALUE(x) OVER (ORDER BY x ASC) first_asc,
FIRST_VALUE(x) OVER (ORDER BY x DESC) first_desc,
LAST_VALUE(x) OVER (ORDER BY x ASC) last_asc,
LAST_VALUE(x) OVER (ORDER BY x DESC) last_desc,
FROM
(SELECT 4 as x),
(SELECT 2 as x),
(SELECT 1 as x),
(SELECT 3 as x)
x,first_asc,first_desc,last_asc,last_desc
1,1,4,1,1
2,1,4,2,2
3,1,4,3,3
4,1,4,4,4
Note that LAST_VALUE
returns 1, 2, 3, 4 instead of just 4 because the window changes for each input row.
Now let's specify a window that is the entire partition:
SELECT
x,
FIRST_VALUE(x) OVER (ORDER BY x ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_asc,
FIRST_VALUE(x) OVER (ORDER BY x DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_desc,
LAST_VALUE(x) OVER (ORDER BY x ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_asc,
LAST_VALUE(x) OVER (ORDER BY x DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_desc,
FROM
(SELECT 4 as x),
(SELECT 2 as x),
(SELECT 1 as x),
(SELECT 3 as x)
x,first_asc,first_desc,last_asc,last_desc
1,1,4,4,1
2,1,4,4,1
3,1,4,4,1
4,1,4,4,1
Now we get 4 for LAST_VALUE
as expected.
Even that question title uses LAST_VALUE
- the question itself asks for Largest Value
!
I would just go with below:
SELECT x, MAX(y) OVER (PARTITION BY x) FROM table
And if no other fields from table involved - i would just do simple GROUP BY:
SELECT x, MAX(y) FROM table GROUP BY x
Of course, we shoudl remember that not always Largest value and MAX value are the same thing.
Other option you have is to change the query order by to desc
SELECT
x,
LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC)
FROM table
order by x desc
but the you will get the last value only for the first row