user-defined-functions | 易学教程

SQL Function to return value from multiple columns

阅读更多关于 SQL Function to return value from multiple columns

问题 I've been developing a few stored procedure and I have been repeating a portion of codes that derives a column based on a few other columns. So instead of copy this piece of code from one stored procedure to another, I'm thinking of having a function that takes the input columns and produces the output columns. Basically, the function goes as: SELECT columnA, columnB, columnC, myFunction(columnA, columnB) as columnD FROM myTable As we can see, this function will take column A and column B as

SQL Function to return value from multiple columns

阅读更多关于 SQL Function to return value from multiple columns

How to create a udf in PySpark which returns an array of strings?

阅读更多关于 How to create a udf in PySpark which returns an array of strings?

问题 I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType(StringType) . Now, somehow this is not working: the dataframe i'm operating on is df_subsets_concat and looks like this: df_subsets_concat.show(3,False) +----------------------+ |col1 | +----------------------+ |oculunt | |predistposed | |incredulous | +----------------------+ only showing top 3 rows and the code is from

How to create a udf in PySpark which returns an array of strings?

阅读更多关于 How to create a udf in PySpark which returns an array of strings?

Using Scala classes as UDF with pyspark

阅读更多关于 Using Scala classes as UDF with pyspark

问题 I'm trying to offload some computations from Python to Scala when using Apache Spark. I would like to use the class interface from Java to be able to use a persistent variable, like so (this is a nonsensical MWE based on my more complex use case): package mwe import org.apache.spark.sql.api.java.UDF1 class SomeFun extends UDF1[Int, Int] { private var prop: Int = 0 override def call(input: Int): Int = { if (prop == 0) { prop = input } prop + input } } Now I'm attempting to use this class from

R - dplyr merge in user-defined function [duplicate]

阅读更多关于 R - dplyr merge in user-defined function [duplicate]

问题 This question already has answers here : join datasets using a quosure as the by argument (2 answers) Closed 2 years ago . I am trying to create a merge function as follows: Defined function merge_tables <- function(inputdata1, inputdata2, byvar1, byvar2) { byvar1 <- enquo(byvar1) byvar2 <- enquo(byvar2) outputdata <- inputdata1 %>% full_join(inputdata2, by = c(rlang::quo_text(byvar1) = rlang::quo_text(byvar2))) return(outputdata) } I am getting an error when I run using the following data

Pyspark UDF AttributeError: 'NoneType' object has no attribute '_jvm'

阅读更多关于 Pyspark UDF AttributeError: 'NoneType' object has no attribute '_jvm'

问题 I have a udf function: @staticmethod @F.udf("array<int>") def create_users_array(val): """ Takes column of ints, returns column of arrays containing ints. """ return [val for _ in range(val)] I call it like so: df.withColumn("myArray", create_users_array(df["myNumber"])) I pass it a dataframe column of integers, and it returns an array of that integer. E.g. 4 --> [4,4,4,4] It was working until we upgraded from Python 2.7, and upgraded our EMR version (which I believe uses Pyspark 2.3) Anyone

BigQuery JavaScript UDF process - per row or per processing node?

阅读更多关于 BigQuery JavaScript UDF process - per row or per processing node?

问题 I'm thinking of using BigQuery's JavaScript UDF as a critical component in a new data architecture. It would be used to logically process each row loaded into the main table, and also to process each row during periodical and ad-hoc aggregation queries. Using an SQL UDF for the same purpose seems to be unfeasible because each row represents a complex object, and implementing the business logic in SQL, including things such as parsing complex text fields, gets ugly very fast. I just read the

Pandas scalar UDF failing, IllegalArgumentException

阅读更多关于 Pandas scalar UDF failing, IllegalArgumentException

问题 First off, I apologize if my issue is simple. I did spend a lot of time researching it. I am trying to set up a scalar Pandas UDF in a PySpark script as described here. Here is my code: from pyspark import SparkContext from pyspark.sql import functions as F from pyspark.sql.types import * from pyspark.sql import SQLContext sc.install_pypi_package("pandas") import pandas as pd sc.install_pypi_package("PyArrow") df = spark.createDataFrame( [("a", 1, 0), ("a", -1, 42), ("b", 3, -1), ("b", 10, -2

Convert a hexadecimal varbinary to its string representation?

阅读更多关于 Convert a hexadecimal varbinary to its string representation?

问题 I have some base-64 encoded strings in SQL Server database, for example: DECLARE @x VARBINARY(64); SET @x = 0x4b78374c6a3733514f723444444d35793665362f6c513d3d When it's CAST or CONVERTED to a VARCHAR, I get: +Ë½Ð:¾Îréî¿• I'm looking for SQL Server to return a varchar with the hexadecimal representation of the varbinary as a varchar, e.g.: 4b78374c6a3733514f723444444d35793665362f6c513d3d Is there a build in CAST/CONVERT/function that does this, or does it have to be added as a User Defined