casting a column dynamically in spark dataframe

后端 未结 2 902
半阙折子戏
半阙折子戏 2021-01-24 07:10

I want to to be able to create a new column out of an existing column(of type string) and cast it to a type dynamically.

resultDF = resultDF.withColumn(newColumn         


        
相关标签:
2条回答
  • 2021-01-24 07:19

    IntegralType is not in the supported DataTypes,

    supported DataTypes are

    StringType  //Gets the StringType object.
    BinaryType  //Gets the BinaryType object.
    BooleanType //Gets the BooleanType object.
    DateType  //Gets the DateType object.
    TimestampType //Gets the TimestampType object.
    CalendarIntervalType  //Gets the CalendarIntervalType object.
    DoubleType  //Gets the DoubleType object.
    FloatType //Gets the FloatType object.
    ByteType  //Gets the ByteType object.
    IntegerType //Gets the IntegerType object.
    LongType  //Gets the LongType object.
    ShortType //Gets the ShortType object.
    NullType  //Gets the NullType object.
    

    In addition to these you can create ArrayType, MapType, DecimalType and StructType too

    public static ArrayType createArrayType(DataType elementType)     //Creates an ArrayType by specifying the data type of elements ({@code elementType}).
    public static ArrayType createArrayType(DataType elementType, boolean containsNull)     //Creates an ArrayType by specifying the data type of elements ({@code elementType}) and whether the array contains null values ({@code containsNull}).
    public static DecimalType createDecimalType(int precision, int scale)     //Creates a DecimalType by specifying the precision and scale.
    public static DecimalType createDecimalType()     //Creates a DecimalType with default precision and scale, which are 10 and 0.
    public static MapType createMapType(DataType keyType, DataType valueType)     //Creates a MapType by specifying the data type of keys ({@code keyType}) and values
    public static MapType createMapType(DataType keyType, DataType valueType, boolean valueContainsNull)     //Creates a MapType by specifying the data type of keys ({@code keyType}), the data type of values ({@code keyType}), and whether values contain any null value ({@code valueContainsNull}).
    public static StructType createStructType(List<StructField> fields)     //Creates a StructType with the given list of StructFields ({@code fields}).
    public static StructType createStructType(StructField[] fields)     //Creates a StructType with the given StructField array ({@code fields}).
    

    So the correct Helper object should be

    object Helper {
    def cast(datatype: String) : DataType = {
    datatype match {
      case "int" => IntegerType
      case "string" => StringType
    }
    }
    
    0 讨论(0)
  • 2021-01-24 07:33

    Why not use string descriptions?

    scala> col("foo").cast("int")
    res2: org.apache.spark.sql.Column = CAST(foo AS INT)
    
    scala> col("foo").cast("string")
    res3: org.apache.spark.sql.Column = CAST(foo AS STRING)
    

    Otherwise use DataType, which will cover all primitive types and basic collections.

    0 讨论(0)
提交回复
热议问题