问题
I have a PySpark Dataframe
with a column of strings
. How can I check which rows in it are Numeric. I could not find any function in PySpark's official documentation -
values = [('25q36',),('75647',),('13864',),('8758K',),('07645',)]
df = sqlContext.createDataFrame(values,['ID',])
df.show()
+-----+
| ID|
+-----+
|25q36|
|75647|
|13864|
|8758K|
|07645|
+-----+
In Python, there is a function .isDigit()
which returns True
or False
if the string
contains just numbers or not.
Expected DataFrame -
+-----+-------+
| ID| Value |
+-----+-------+
|25q36| False |
|75647| True |
|13864| True |
|8758K| False |
|07645| True |
+-----+-------+
I will like to avoid creating a UDF
.
回答1:
A simple cast would do the job :
from pyspark.sql import functions as F
my_df.select(
"ID",
F.col("ID").cast("int").isNotNull().alias("Value ")
).show()
+-----+------+
| ID|Value |
+-----+------+
|25q36| false|
|75647| true|
|13864| true|
|8758K| false|
|07645| true|
+-----+------+
回答2:
If you want you can also build a custom udf
for this purpose:
from pyspark.sql.types import BooleanType
from pyspark.sql import functions as F
def is_digit(val):
if val:
return val.isdigit()
else:
return False
is_digit_udf = udf(is_digit, BooleanType())
df = df.withColumn('Value', F.when(is_digit_udf(F.col('ID')), F.lit(True)).otherwise(F.lit(False)))
回答3:
Try this, is Scala language
spark.udf.register("IsNumeric", (inpColumn: Int) => BigInt(inpColumn).isInstanceOf[BigInt])
spark.sql(s""" select "ABCD", IsNumeric(1234) as IsNumeric_1 """).show(false)
来源:https://stackoverflow.com/questions/53743795/how-to-check-if-a-string-column-in-pyspark-dataframe-is-all-numeric