user-defined-functions

Spark dataframe to numpy array via udf or without collecting to driver

旧街凉风 提交于 2020-04-30 09:48:46
问题 Real life df is a massive dataframe that cannot be loaded into driver memory. Can this be done using regular or pandas udf? # Code to generate a sample dataframe from pyspark.sql import functions as F from pyspark.sql.types import * import pandas as pd import numpy as np sample = [['123',[[0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1], [0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1]]], ['345',[[1,0,0,0,0,1,1,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1], [0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1]]], ['425',

Spark dataframe to numpy array via udf or without collecting to driver

家住魔仙堡 提交于 2020-04-30 09:47:45
问题 Real life df is a massive dataframe that cannot be loaded into driver memory. Can this be done using regular or pandas udf? # Code to generate a sample dataframe from pyspark.sql import functions as F from pyspark.sql.types import * import pandas as pd import numpy as np sample = [['123',[[0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1], [0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1]]], ['345',[[1,0,0,0,0,1,1,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1], [0,1,0,0,0,1,1,1,1,1,1,0,1,0,0,0,1,1,1,1,1,1]]], ['425',

User defined function with while loop in SQL Server

我的未来我决定 提交于 2020-04-17 22:50:16
问题 I am asked to create a user defined function in SQL Server to returns the following pattern (for example, if the input = 5): ***** **** *** ** * Here is my code: alter function udf_star (@input int) returns varchar (200) as begin declare @star int set @star = @input declare @space int set @space = 0 while @star > 0 begin declare @string varchar (200) set @string = replicate (' ', @space) + replicate ('*', @star) set @star = @star - 1 set @space = @space + 1 end return @string end When I

User defined function with while loop in SQL Server

落爺英雄遲暮 提交于 2020-04-17 22:45:32
问题 I am asked to create a user defined function in SQL Server to returns the following pattern (for example, if the input = 5): ***** **** *** ** * Here is my code: alter function udf_star (@input int) returns varchar (200) as begin declare @star int set @star = @input declare @space int set @space = 0 while @star > 0 begin declare @string varchar (200) set @string = replicate (' ', @space) + replicate ('*', @star) set @star = @star - 1 set @space = @space + 1 end return @string end When I

Azure Cosmos DB UDF for date time is seriously slowing down query

那年仲夏 提交于 2020-04-17 21:46:43
问题 Our dates are stored as: "2/22/2008 12:00:00 AM" . We need to filter results so that we get documents between two times. If we compare two queries, one using a UDF and the other not, the one with the UDF is orders of magnitude slower. With: SELECT DISTINCT c.eh, c.wcm, w AS wt FROM c JOIN w IN c.wt WHERE (udf.toValue(w.ced) BETWEEN udf.toValue('03/02/2023') AND udf.toValue('09/02/2023')) AND w.ty = 'FW' OFFSET 0 LIMIT 10 And without: SELECT DISTINCT c.eh, c.wcm, w AS wt FROM c JOIN w IN c.wt

Azure Cosmos DB UDF for date time is seriously slowing down query

血红的双手。 提交于 2020-04-17 21:41:23
问题 Our dates are stored as: "2/22/2008 12:00:00 AM" . We need to filter results so that we get documents between two times. If we compare two queries, one using a UDF and the other not, the one with the UDF is orders of magnitude slower. With: SELECT DISTINCT c.eh, c.wcm, w AS wt FROM c JOIN w IN c.wt WHERE (udf.toValue(w.ced) BETWEEN udf.toValue('03/02/2023') AND udf.toValue('09/02/2023')) AND w.ty = 'FW' OFFSET 0 LIMIT 10 And without: SELECT DISTINCT c.eh, c.wcm, w AS wt FROM c JOIN w IN c.wt

For BigQuery JS UDF, is there any simpler way to load a wasm file into a user defined function on?

耗尽温柔 提交于 2020-04-17 21:37:35
问题 As illustrated here, dumping the wasm byte code and copy past into the javascript seems difficult. 来源: https://stackoverflow.com/questions/60094498/for-bigquery-js-udf-is-there-any-simpler-way-to-load-a-wasm-file-into-a-user-de

How to extract floats from vector columns in PySpark?

风格不统一 提交于 2020-03-28 06:40:25
问题 My Spark DataFrame has data in the following format: The printSchema() shows that each column is of the type vector . I tried to get the values out of [ and ] using the code below (for 1 columns col1 ): from pyspark.sql.functions import udf from pyspark.sql.types import FloatType firstelement=udf(lambda v:float(v[0]),FloatType()) df.select(firstelement('col1')).show() However, how can I apply it to all columns of df ? 回答1: 1. Extract first element of a single vector column: To get the first

Can I use `TextEncoder` in Bigquery JS UDF?

被刻印的时光 ゝ 提交于 2020-03-25 19:25:10
问题 I am trying to use some Rust wasm code in Bigquery as UDF, and in order to pass on Java String to Rust code the TextEncoder and TextDecoder would be needed to conveniently doing that. As it mentioned here Passing a JavaScript string to a Rust function compiled to WebAssembly But when I try out some of my code on BigQuery, I encountered an error saying TextEncoder is not defined. You can try it out as well with a query like this: https://github.com/liufuyang/rb62-wasm/blob/master/try-3-old.sql

How to subtract a column of days from a column of dates in Pyspark?

时光总嘲笑我的痴心妄想 提交于 2020-03-18 10:54:09
问题 Given the following PySpark DataFrame df = sqlContext.createDataFrame([('2015-01-15', 10), ('2015-02-15', 5)], ('date_col', 'days_col')) How can the days column be subtracted from the date column? In this example, the resulting column should be ['2015-01-05', '2015-02-10'] . I looked into pyspark.sql.functions.date_sub() , but it requires a date column and a single day, i.e. date_sub(df['date_col'], 10) . Ideally, I'd prefer to do date_sub(df['date_col'], df['days_col']) . I also tried