How to do mathematical operation with two column in dataframe using pyspark

拈花ヽ惹草 提交于 2020-01-01 05:40:32

问题


I have dataframe with three column "x" ,"y" and "z"

x        y         z
bn      12452     221
mb      14521     330
pl      12563     160
lo      22516     142

I need to create a another column which is derived by this formula

(m = z / y+z)

So the new data frameshould look something like this:

x        y         z        m
bn      12452     221      .01743
mb      14521     330      .02222
pl      12563     160      .01257
lo      22516     142      .00626

回答1:


df = sqlContext.createDataFrame([('bn', 12452, 221), ('mb', 14521, 330)], ['x', 'y', 'z'])
df = df.withColumn('m', df['z'] / (df['y'] + df['z']))
df.head(2)


来源:https://stackoverflow.com/questions/40728017/how-to-do-mathematical-operation-with-two-column-in-dataframe-using-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!