We have a table of bid prices and sizes of two buyers. Bid price p with size s means that the buyer is open to buy s number of product at price p. We have a table of four columns:
- bid prices offered by the two buyers, pA and pB.
- bid sizes, sA and sB.
Our job is to add a new best size column (bS) to the table, that returns the size at the best price. If the two buyers have the same price then bS is equal to sA + sB, otherwise, we need to take the bid size of the buyer that offers the higher price.
An example table with the desired output is below.
A simple solution to the problem:
SELECT pA, pB, sA, sB,
CASE
WHEN pA = pB THEN sA + sB
WHEN pA > pB THEN sA
ELSE sB
END AS bS
FROM t
Now let us generalize the problem to four buyers. A standard SQL solution is
WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bestPrice
FROM `t`
)
SELECT *, (sA * CAST(pA = bestPrice AS INT64) +
sB * CAST(pB = bestPrice AS INT64) +
sC * CAST(pC = bestPrice AS INT64) +
sD * CAST(pD = bestPrice AS INT64))
AS bS FROM t_ext
Question 1)
Is there a simplified query that
- uses function SUM instead of adding four items manually
- avoids repeated casting?
Question 2)
Is there a way in Google BigQuery ecosystem to reuse this query for another table that has column name e.g. priceA, priceB instead of pA, pB?
Btw. I wrote a blog post about this problem that focuses on solutions in Python and Q and I am wondering how the best solution in standard sql looks like.
Below is for BigQuery Standard SQL an dis generic enough to not depend on number of buyers as well as naming for price and size fields. The only expectation is for all prices go first and then all respective sizes as it is in your example. Also i assume all numbers are integers (as in example in question) but this can be adjust to deal with FLOATs
#standardSQL
WITH t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= 4) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r':(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
The only what you need to change in above query is number of buyers - in below expressions (in those below - 4
can be replaced with ARRAY_LENGTH(arr) / 2
WHERE OFFSET < 4
WHERE OFFSET >= 4
WHERE OFFSET < 4
For example, for below dummy data (4 buyers)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 pA, 2 pB, 3 pC, 4 pD, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
SELECT 1, 4, 2, 4, 1, 6, 1, 5 UNION ALL
SELECT 4, 4, 2, 1, 7, 1, 1, 1
), t_ext AS (
SELECT * EXCEPT(arr),
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS prices,
ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= 4) AS sizes,
(SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < 4) AS bestPrice
FROM (
SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r':(\d+)') AS arr
FROM `project.dataset.table` t
)
)
SELECT * EXCEPT(prices, sizes),
(SELECT SUM(size)
FROM UNNEST(prices) price WITH OFFSET
JOIN UNNEST(sizes) size WITH OFFSET
USING(OFFSET)
WHERE price = bestPrice
) AS bS
FROM t_ext
result is
Row pA pB pC pD sA sB sC sD bestPrice bS
1 1 2 3 4 1 1 1 5 4 5
2 1 4 2 4 1 6 1 5 4 11
3 4 4 2 1 7 1 1 1 4 8
来源:https://stackoverflow.com/questions/56741959/weighted-sum-of-a-column-vector-and-a-derived-bit-vector