Calculate UDF once
问题 I want to have a UUID column in a pyspark dataframe that is calculated only once, so that I can select the column in a different dataframe and have the UUIDs be the same. However, the UDF for the UUID column is recalculated when I select the column. Here's what I'm trying to do: >>> uuid_udf = udf(lambda: str(uuid.uuid4()), StringType()) >>> a = spark.createDataFrame([[1, 2]], ['col1', 'col2']) >>> a = a.withColumn('id', uuid_udf()) >>> a.collect() [Row(col1=1, col2=2, id='5ac8f818-e2d8-4c50