问题
I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING SPLIT part.
Sample SQL data:
CREATE TABLE FactTurnover
(
ID INT,
SalesPriceExcl NUMERIC (9,4),
Discount VARCHAR(100)
)
INSERT INTO FactTurnover
VALUES
(1, 100, '10'),
(2, 39.5877, '58, 12'),
(3, 100, '50, 10, 15'),
(4, 100, 'B')
Query:
;WITH CTE AS
(
SELECT Id, SalesPriceExcl,
CASE WHEN value = 'B' THEN 0
ELSE CAST(value as int) END AS Discount
From FactTurnover
CROSS APPLY STRING_SPLIT(Discount, ',')
)
SELECT Id,
Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) *
MIN(SalesPriceExcl) As Numeric(9,2))
PriceAfterDiscount
FROM CTE
GROUP BY ID
Expected Results:
| Id | SalesPriceExcludingDiscount | TotalDiscount | PriceAfterDiscount |
|----|-----------------------------|---------------------|--------------------|
| 1 | 100 | 0.9 | 90 |
| 2 | 39.5877 | 0.36960000000000004 | 14.63 |
| 3 | 100 | 0.38250000000000006 | 38.25 |
| 4 | 100 | 1 | 100 |
回答1:
Use SPLIT
to convert the comma-separated string to an array then use LATERAL VIEW
and EXPLODE
to do operations on the elements of that array. The roughly equivalent syntax (including CTEs) is:
%sql
--SELECT * FROM FactTurnover;
WITH cte AS
(
SELECT *
FROM
(
SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
FROM FactTurnover
) x
LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
)
SELECT
Id,
MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
FROM cte
GROUP BY Id
ORDER BY Id
If you are feeling brave, you could also do this using higher order functions. I've included two examples below. I would say these are harder to debug and you should probably try them performance-wise, it depends what you're comfortable with:
%sql
-- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
;WITH filterB AS (
SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
FROM FactTurnover
), cte1 AS (
-- Do initial calcs on array
SELECT
Id,
TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
FROM filterB
)
SELECT
Id,
EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
FROM cte1;
-- all in one example
SELECT
Id,
EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
AS final
FROM FactTurnover
ORDER BY Id
来源:https://stackoverflow.com/questions/57994948/databricks-equivalent-code-for-sql-query