Why doesn't LAST_VALUE return the last value?

前端 未结 3 464
盖世英雄少女心
盖世英雄少女心 2021-02-01 10:29

I want to find the last value of y over an ordered partition using a query like this:

SELECT
  x,
  LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y AS         


        
相关标签:
3条回答
  • 2021-02-01 10:50

    TLDR: The query you want is:

    SELECT
      x,
      LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM table
    

    Possibly followed by GROUP BY to collapse duplicate output rows from the analytic function.

    And of course, it's simpler to just use MAX over an unordered partition if that's all you need:

    SELECT
      x,
      MAX(y) OVER (PARTITION BY x)
    FROM table
    

    Before answering this question, here's a little background on analytic functions (a.k.a. window functions). All of the below is standard SQL and not specific to BigQuery.

    First, analytic functions are not aggregation functions. Whereas aggregation functions collapse multiple input rows into a single output row, analytic functions compute exactly one output row for every input row. So you need to make sure you're thinking about what the output is for every input row.

    Second, analytic functions operate over a "window" of rows that is a subset of the "partition" to which the row belongs. The partition for an input row is determined by the PARTITION BY clause, or you can omit it if you want the partition to be the entire set of input rows. The window is given by the ROWS clause, but if you don't specify it (and users usually don't), it defaults to either the entire partition (when no ordering is applied) or the set of rows in the partition from the first row to the current row (when an ORDER BY is present). Note that the window can differ for each input row in a partition!

    Now, back to LAST_VALUE. Although the default window described above is reasonable in many cases (e.g., computing cumulative sums), it works spectacularly poorly with LAST_VALUE. The LAST_VALUE function returns the value of the last row in the window, and by default the last row in the window is the current row.

    So to fix the problem, you need to explicitly specify that the window for LAST_VALUE is the entire partition, not just the rows up to the current row. You can do so as follows:

    SELECT x, LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM table
    

    To test this out, here's an example:

    SELECT
      x,
      FIRST_VALUE(x) OVER (ORDER BY x ASC) first_asc,
      FIRST_VALUE(x) OVER (ORDER BY x DESC) first_desc,
      LAST_VALUE(x) OVER (ORDER BY x ASC) last_asc,
      LAST_VALUE(x) OVER (ORDER BY x DESC) last_desc,
    FROM
      (SELECT 4 as x),
      (SELECT 2 as x),
      (SELECT 1 as x),
      (SELECT 3 as x)
    
    x,first_asc,first_desc,last_asc,last_desc
    1,1,4,1,1
    2,1,4,2,2
    3,1,4,3,3
    4,1,4,4,4
    

    Note that LAST_VALUE returns 1, 2, 3, 4 instead of just 4 because the window changes for each input row.

    Now let's specify a window that is the entire partition:

    SELECT
      x,
      FIRST_VALUE(x) OVER (ORDER BY x ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_asc,
      FIRST_VALUE(x) OVER (ORDER BY x DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_desc,
      LAST_VALUE(x) OVER (ORDER BY x ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_asc,
      LAST_VALUE(x) OVER (ORDER BY x DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_desc,
    FROM
      (SELECT 4 as x),
      (SELECT 2 as x),
      (SELECT 1 as x),
      (SELECT 3 as x)
    
    x,first_asc,first_desc,last_asc,last_desc
    1,1,4,4,1
    2,1,4,4,1
    3,1,4,4,1
    4,1,4,4,1
    

    Now we get 4 for LAST_VALUE as expected.

    0 讨论(0)
  • 2021-02-01 10:51

    Even that question title uses LAST_VALUE - the question itself asks for Largest Value!
    I would just go with below:

    SELECT x, MAX(y) OVER (PARTITION BY x) FROM table  
    

    And if no other fields from table involved - i would just do simple GROUP BY:

    SELECT x, MAX(y) FROM table GROUP BY x 
    

    Of course, we shoudl remember that not always Largest value and MAX value are the same thing.

    0 讨论(0)
  • 2021-02-01 10:55

    Other option you have is to change the query order by to desc

    SELECT
      x,
      LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC)
    FROM table
    order by x desc
    

    but the you will get the last value only for the first row

    0 讨论(0)
提交回复
热议问题