Spark DataSet efficiently get length size of entire row

前端 未结 1 1170
余生分开走
余生分开走 2021-01-15 23:02

I\'m working with different size of dataSet each one with a dynamic size of columns - for my application, I have a requirement to know the entire row length of characters fo

1条回答
  •  孤城傲影
    2021-01-15 23:27

    nice solution with spark Dataframe UDF I have used to get Bytes length which is better for my case:

    static UDF1 BytesSize = new UDF1() {
        public Integer call(final String line) throws Exception {
            return line.getBytes().length;
        }
    };
    
    private void saveIt(){
    
    sparkSession.udf().register("BytesSize",BytesSize,DataTypes.IntegerType);
        dfToWrite.withColumn("fullLineBytesSize",callUDF("BytesSize",functions.concat_ws( ",",columns)) ).write().partitionBy(hivePartitionColumn)
                        .option("header", "true")
                        .mode(SaveMode.Append).format(storageFormat).save(pathTowrite);
    }
    

    0 讨论(0)
提交回复
热议问题