how to check if a string column in pyspark dataframe is all numeric

拈花ヽ惹草 提交于 2019-12-07 21:07:49

问题


I have a PySpark Dataframe with a column of strings. How can I check which rows in it are Numeric. I could not find any function in PySpark's official documentation -

values = [('25q36',),('75647',),('13864',),('8758K',),('07645',)]
df = sqlContext.createDataFrame(values,['ID',])
df.show()
+-----+
|   ID|
+-----+
|25q36|
|75647|
|13864|
|8758K|
|07645|
+-----+

In Python, there is a function .isDigit() which returns True or False if the string contains just numbers or not.

Expected DataFrame -

+-----+-------+
|   ID| Value |
+-----+-------+
|25q36| False |
|75647| True  |
|13864| True  |
|8758K| False |
|07645| True  |
+-----+-------+

I will like to avoid creating a UDF.


回答1:


A simple cast would do the job :

from pyspark.sql import functions as F

my_df.select(
  "ID",
  F.col("ID").cast("int").isNotNull().alias("Value ")
).show()

+-----+------+
|   ID|Value |
+-----+------+
|25q36| false|
|75647|  true|
|13864|  true|
|8758K| false|
|07645|  true|
+-----+------+



回答2:


If you want you can also build a custom udf for this purpose:

from pyspark.sql.types import BooleanType
from pyspark.sql import functions as F

def is_digit(val):
    if val:
        return val.isdigit()
    else:
        return False

is_digit_udf = udf(is_digit, BooleanType())

df = df.withColumn('Value', F.when(is_digit_udf(F.col('ID')), F.lit(True)).otherwise(F.lit(False)))



回答3:


Try this, is Scala language

spark.udf.register("IsNumeric", (inpColumn: Int) => BigInt(inpColumn).isInstanceOf[BigInt])
spark.sql(s""" select "ABCD", IsNumeric(1234) as IsNumeric_1  """).show(false)


来源:https://stackoverflow.com/questions/53743795/how-to-check-if-a-string-column-in-pyspark-dataframe-is-all-numeric

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!