Functions from Python packages for udf() of Spark dataframe

后端 未结 1 1051
说谎
说谎 2020-12-30 12:17

For Spark dataframe via pyspark, we can use pyspark.sql.functions.udf to create a user defined function (UDF).

I wonder if I can use any

相关标签:
1条回答
  • 2020-12-30 12:38

    Assuming you want to add a column named new to your DataFrame df constructed by calling numpy.random.normal repeatedly, you could do:

    import numpy
    from pyspark.sql.functions import UserDefinedFunction
    from pyspark.sql.types import DoubleType
    
    udf = UserDefinedFunction(numpy.random.normal, DoubleType())
    
    df_with_new_column = df.withColumn('new', udf())
    
    0 讨论(0)
提交回复
热议问题