user-defined-functions

SQL Function to return value from multiple columns

风格不统一 提交于 2020-07-23 06:50:07
问题 I've been developing a few stored procedure and I have been repeating a portion of codes that derives a column based on a few other columns. So instead of copy this piece of code from one stored procedure to another, I'm thinking of having a function that takes the input columns and produces the output columns. Basically, the function goes as: SELECT columnA, columnB, columnC, myFunction(columnA, columnB) as columnD FROM myTable As we can see, this function will take column A and column B as

SQL Function to return value from multiple columns

青春壹個敷衍的年華 提交于 2020-07-23 06:49:13
问题 I've been developing a few stored procedure and I have been repeating a portion of codes that derives a column based on a few other columns. So instead of copy this piece of code from one stored procedure to another, I'm thinking of having a function that takes the input columns and produces the output columns. Basically, the function goes as: SELECT columnA, columnB, columnC, myFunction(columnA, columnB) as columnD FROM myTable As we can see, this function will take column A and column B as

How to create a udf in PySpark which returns an array of strings?

不打扰是莪最后的温柔 提交于 2020-07-17 07:24:18
问题 I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType(StringType) . Now, somehow this is not working: the dataframe i'm operating on is df_subsets_concat and looks like this: df_subsets_concat.show(3,False) +----------------------+ |col1 | +----------------------+ |oculunt | |predistposed | |incredulous | +----------------------+ only showing top 3 rows and the code is from

How to create a udf in PySpark which returns an array of strings?

泪湿孤枕 提交于 2020-07-17 07:24:11
问题 I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType(StringType) . Now, somehow this is not working: the dataframe i'm operating on is df_subsets_concat and looks like this: df_subsets_concat.show(3,False) +----------------------+ |col1 | +----------------------+ |oculunt | |predistposed | |incredulous | +----------------------+ only showing top 3 rows and the code is from

Using Scala classes as UDF with pyspark

試著忘記壹切 提交于 2020-07-16 00:45:13
问题 I'm trying to offload some computations from Python to Scala when using Apache Spark. I would like to use the class interface from Java to be able to use a persistent variable, like so (this is a nonsensical MWE based on my more complex use case): package mwe import org.apache.spark.sql.api.java.UDF1 class SomeFun extends UDF1[Int, Int] { private var prop: Int = 0 override def call(input: Int): Int = { if (prop == 0) { prop = input } prop + input } } Now I'm attempting to use this class from

R - dplyr merge in user-defined function [duplicate]

元气小坏坏 提交于 2020-06-28 04:41:41
问题 This question already has answers here : join datasets using a quosure as the by argument (2 answers) Closed 2 years ago . I am trying to create a merge function as follows: Defined function merge_tables <- function(inputdata1, inputdata2, byvar1, byvar2) { byvar1 <- enquo(byvar1) byvar2 <- enquo(byvar2) outputdata <- inputdata1 %>% full_join(inputdata2, by = c(rlang::quo_text(byvar1) = rlang::quo_text(byvar2))) return(outputdata) } I am getting an error when I run using the following data

Pyspark UDF AttributeError: 'NoneType' object has no attribute '_jvm'

情到浓时终转凉″ 提交于 2020-06-27 17:01:05
问题 I have a udf function: @staticmethod @F.udf("array<int>") def create_users_array(val): """ Takes column of ints, returns column of arrays containing ints. """ return [val for _ in range(val)] I call it like so: df.withColumn("myArray", create_users_array(df["myNumber"])) I pass it a dataframe column of integers, and it returns an array of that integer. E.g. 4 --> [4,4,4,4] It was working until we upgraded from Python 2.7, and upgraded our EMR version (which I believe uses Pyspark 2.3) Anyone

BigQuery JavaScript UDF process - per row or per processing node?

北城余情 提交于 2020-06-27 05:21:05
问题 I'm thinking of using BigQuery's JavaScript UDF as a critical component in a new data architecture. It would be used to logically process each row loaded into the main table, and also to process each row during periodical and ad-hoc aggregation queries. Using an SQL UDF for the same purpose seems to be unfeasible because each row represents a complex object, and implementing the business logic in SQL, including things such as parsing complex text fields, gets ugly very fast. I just read the

Pandas scalar UDF failing, IllegalArgumentException

落爺英雄遲暮 提交于 2020-06-16 07:58:16
问题 First off, I apologize if my issue is simple. I did spend a lot of time researching it. I am trying to set up a scalar Pandas UDF in a PySpark script as described here. Here is my code: from pyspark import SparkContext from pyspark.sql import functions as F from pyspark.sql.types import * from pyspark.sql import SQLContext sc.install_pypi_package("pandas") import pandas as pd sc.install_pypi_package("PyArrow") df = spark.createDataFrame( [("a", 1, 0), ("a", -1, 42), ("b", 3, -1), ("b", 10, -2

Convert a hexadecimal varbinary to its string representation?

落爺英雄遲暮 提交于 2020-06-16 07:42:56
问题 I have some base-64 encoded strings in SQL Server database, for example: DECLARE @x VARBINARY(64); SET @x = 0x4b78374c6a3733514f723444444d35793665362f6c513d3d When it's CAST or CONVERTED to a VARCHAR, I get: +˽Ð:¾Îréî¿• I'm looking for SQL Server to return a varchar with the hexadecimal representation of the varbinary as a varchar, e.g.: 4b78374c6a3733514f723444444d35793665362f6c513d3d Is there a build in CAST/CONVERT/function that does this, or does it have to be added as a User Defined