Adding a “calculated column” to BigQuery query without repeating the calculations

夙愿已清 提交于 2021-01-20 13:44:08

问题


I want to resuse value of calculated columns in a new third column. For example, this query works:

select
  countif(cond1) as A,
  countif(cond2) as B,
  countif(cond1)/countif(cond2) as prct_pass
  From 
  Where
  Group By

But when I try to use A,B instead of repeating the countif, it doesn't work because A and B are invalid:

select
  countif(cond1) as A,
  countif(cond2) as B,
  A/B as prct_pass
  From 
  Where
  Group By

Can I somehow make the more readable second version work ? Is this first one inefficient ?


回答1:


You should construct a subquery (i.e. a double select) like

SELECT A, B, A/B as prct_pass 
FROM 
(
SELECT countif(cond1) as A, 
       countif(cond2) as B 
       FROM <yourtable>
)

The same amount of data will be processed in both queries. In the subquery one you will do only 2 countif(), in case that step takes a long time then doing 2 instead of 4 should be more efficient indeed.

Looking at an example using bigquery public datasets:

SELECT 
countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B,
countif(homeFinalRuns>3)/countif(awayFinalRuns>3) as division 
FROM `bigquery-public-data.baseball.games_post_wide`  

or

SELECT A, B, A/B as division FROM 
(
SELECT countif(homeFinalRuns>3) as A, 
       countif(awayFinalRuns>3) as B 
       FROM `bigquery-public-data.baseball.games_post_wide`  
)

we can see that doing all in one (without a subquery) is actually slightly faster. (I ran the queries 6 times for different values of the inequality, 5 times was faster and one time slower)

In any case, the efficiency will depend on how taxing is to compute the condition in your particular dataset.



来源:https://stackoverflow.com/questions/62895643/adding-a-calculated-column-to-bigquery-query-without-repeating-the-calculation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!