I want to to be able to create a new column out of an existing column(of type string) and cast it to a type dynamically.
resultDF = resultDF.withColumn(newColumn
IntegralType
is not in the supported DataTypes,
supported DataTypes are
StringType //Gets the StringType object.
BinaryType //Gets the BinaryType object.
BooleanType //Gets the BooleanType object.
DateType //Gets the DateType object.
TimestampType //Gets the TimestampType object.
CalendarIntervalType //Gets the CalendarIntervalType object.
DoubleType //Gets the DoubleType object.
FloatType //Gets the FloatType object.
ByteType //Gets the ByteType object.
IntegerType //Gets the IntegerType object.
LongType //Gets the LongType object.
ShortType //Gets the ShortType object.
NullType //Gets the NullType object.
In addition to these you can create ArrayType
, MapType
, DecimalType
and StructType
too
public static ArrayType createArrayType(DataType elementType) //Creates an ArrayType by specifying the data type of elements ({@code elementType}).
public static ArrayType createArrayType(DataType elementType, boolean containsNull) //Creates an ArrayType by specifying the data type of elements ({@code elementType}) and whether the array contains null values ({@code containsNull}).
public static DecimalType createDecimalType(int precision, int scale) //Creates a DecimalType by specifying the precision and scale.
public static DecimalType createDecimalType() //Creates a DecimalType with default precision and scale, which are 10 and 0.
public static MapType createMapType(DataType keyType, DataType valueType) //Creates a MapType by specifying the data type of keys ({@code keyType}) and values
public static MapType createMapType(DataType keyType, DataType valueType, boolean valueContainsNull) //Creates a MapType by specifying the data type of keys ({@code keyType}), the data type of values ({@code keyType}), and whether values contain any null value ({@code valueContainsNull}).
public static StructType createStructType(List<StructField> fields) //Creates a StructType with the given list of StructFields ({@code fields}).
public static StructType createStructType(StructField[] fields) //Creates a StructType with the given StructField array ({@code fields}).
So the correct Helper
object should be
object Helper {
def cast(datatype: String) : DataType = {
datatype match {
case "int" => IntegerType
case "string" => StringType
}
}
Why not use string descriptions?
scala> col("foo").cast("int")
res2: org.apache.spark.sql.Column = CAST(foo AS INT)
scala> col("foo").cast("string")
res3: org.apache.spark.sql.Column = CAST(foo AS STRING)
Otherwise use DataType, which will cover all primitive types and basic collections.