In spark iterate through each column and find the max length

前端 未结 3 1747
清歌不尽
清歌不尽 2021-01-15 20:28

I am new to spark scala and I have following situation as below I have a table \"TEST_TABLE\" on cluster(can be hive table) I am converting that to dataframe as:

<         


        
3条回答
  •  时光说笑
    2021-01-15 21:08

    Here is one more way to get the report of column names in vertical

    scala> val df = Seq(("abc","abcd","abcdef"),("a","BCBDFG","qddfde"),("MN","1234B678","sd")).toDF("COL1","COL2","COL3")
    df: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 1 more field]
    
    scala> df.show(false)
    +----+--------+------+
    |COL1|COL2    |COL3  |
    +----+--------+------+
    |abc |abcd    |abcdef|
    |a   |BCBDFG  |qddfde|
    |MN  |1234B678|sd    |
    +----+--------+------+
    
    scala> val columns = df.columns
    columns: Array[String] = Array(COL1, COL2, COL3)
    
    scala> val df2 = columns.foldLeft(df) { (acc,x) => acc.withColumn(x,length(col(x))) }
    df2: org.apache.spark.sql.DataFrame = [COL1: int, COL2: int ... 1 more field]
    
    scala> df2.select( columns.map(x => max(col(x))):_* ).show(false)
    +---------+---------+---------+
    |max(COL1)|max(COL2)|max(COL3)|
    +---------+---------+---------+
    |3        |8        |6        |
    +---------+---------+---------+
    
    
    scala> df3.flatMap( r => { (0 until r.length).map( i => (columns(i),r.getInt(i)) ) } ).show(false)
    +----+---+
    |_1  |_2 |
    +----+---+
    |COL1|3  |
    |COL2|8  |
    |COL3|6  |
    +----+---+
    
    
    scala>
    

    To get the results into Scala collections, say Map()

    scala> val result = df3.flatMap( r => { (0 until r.length).map( i => (columns(i),r.getInt(i)) ) } ).as[(String,Int)].collect.toMap
    result: scala.collection.immutable.Map[String,Int] = Map(COL1 -> 3, COL2 -> 8, COL3 -> 6)
    
    scala> result
    res47: scala.collection.immutable.Map[String,Int] = Map(COL1 -> 3, COL2 -> 8, COL3 -> 6)
    
    scala>
    

提交回复
热议问题