Get the last element from Apache Spark SQL split() Function

后端 未结 4 937
太阳男子
太阳男子 2021-02-06 13:54

I want to get the last element from the Array that return from Spark SQL split() function.

split(4:3-2:3-5:4-6:4-5:2,\'-\')

I know it can get b

相关标签:
4条回答
  • 2021-02-06 14:08

    You could use an UDF to do that, as follow:

    val df = sc.parallelize(Seq((1L,"one-last1"), (2L,"two-last2"), (3L,"three-last3"))).toDF("key","Value")
    +---+-----------+
    |key|Value      |
    +---+-----------+
    |1  |one-last1  |
    |2  |two-last2  |
    |3  |three-last3|
    +---+-----------+
    
    val get_last = udf((xs: Seq[String]) => Try(xs.last).toOption)
    
    val with_just_last = df.withColumn("Last" , get_last(split(col("Value"), "-")))
    +---+-----------+--------+
    |key|Value      |Last    |
    +---+-----------+--------+
    |1  |one-last1  |last1   |
    |2  |two-last2  |last2   |
    |3  |three-last3|last3   |
    +---+-----------+--------+
    

    Remember that the split function from SparkSQL can be applied to a column of the DataFrame.

    0 讨论(0)
  • 2021-02-06 14:12

    You can also use SparkSql Reverse() function on a column after Split(). For example:

    SELECT reverse(split(MY_COLUMN,'-'))[0] FROM MY_TABLE

    Here [0] gives you the first element of the reversed array, which is the last element of the initial array.

    0 讨论(0)
  • 2021-02-06 14:20

    Please check substring_index it should work exactly as you want:

    substring_index(lit("1-2-3-4"), "-", -1) // 4
    
    0 讨论(0)
  • 2021-02-06 14:20

    use split(MY_COLUMN,'-').getItem(0) if you are using Java

    0 讨论(0)
提交回复
热议问题