Checking whether a column has proper decimal number

♀尐吖头ヾ 提交于 2021-01-28 08:55:14

问题


I have a dataframe (input_dataframe), which looks like as below:

id        test_column
1           0.25
2           1.1
3           12
4           test
5           1.3334
6           .11

I want to add a column result, which put values 1 if test_column has a decimal value and 0 if test_column has any other value. data type of test_column is string. Below is the expected output:

id        test_column      result
1           0.25              1
2           1.1               1
3           12                0
4           test              0
5           1.3334            1
6           .11               1

Can we achieve it using pySpark code?


回答1:


You can parse decimal token with decimal.Decimal()

Here we are binding the code inside a UDF then using df.withColumn

import decimal
from pyspark.sql.types import IntType

def is_valid_decimal(s):
    try:
        # return (0 if decimal.Decimal(val) == int(decimal.Decimal(val)) else 1)            
        return (0 if decimal.Decimal(val)._isinteger() else 1)
    except decimal.InvalidOperation:
        return 0

# register the UDF for usage
sqlContext.udf.register("is_valid_decimal", is_valid_decimal, IntType())

# Using the UDF
df.withColumn("result", is_valid_decimal("test_column"))


来源:https://stackoverflow.com/questions/46598685/checking-whether-a-column-has-proper-decimal-number

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!