I\'m working with different size of dataSet each one with a dynamic size of columns - for my application, I have a requirement to know the entire row length of characters fo
nice solution with spark Dataframe UDF I have used to get Bytes length which is better for my case:
static UDF1 BytesSize = new UDF1() {
public Integer call(final String line) throws Exception {
return line.getBytes().length;
}
};
private void saveIt(){
sparkSession.udf().register("BytesSize",BytesSize,DataTypes.IntegerType);
dfToWrite.withColumn("fullLineBytesSize",callUDF("BytesSize",functions.concat_ws( ",",columns)) ).write().partitionBy(hivePartitionColumn)
.option("header", "true")
.mode(SaveMode.Append).format(storageFormat).save(pathTowrite);
}