How to find common elements among two array columns?

前端 未结 3 2028
执笔经年
执笔经年 2021-01-25 08:49

I have two comma-separated string columns (sourceAuthors and targetAuthors).

val df = Seq(
  (\"Author1,Author2,Author3\",\"Author2,Aut         


        
3条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-25 09:24

    Based on SCouto answer, I give you the complete solution that worked for me:

      def myUDF: UserDefinedFunction = udf(
    (s1: String, s2: String) => {
      val splitted1 = s1.split(",")
      val splitted2 = s2.split(",")
      splitted1.intersect(splitted2).length
    })
    
      val spark = SparkSession.builder().master("local").getOrCreate()
    
      import spark.implicits._
    
      val df = Seq(("Author1,Author2,Author3","Author2,Author3,Author1")).toDF("source","target")
    
      df.show(false)
    
    +-----------------------+-----------------------+
    |source                 |target                 |
    +-----------------------+-----------------------+
    |Author1,Author2,Author3|Author2,Author3,Author1|
    +-----------------------+-----------------------+
    
      val newDF: DataFrame = df.withColumn("nCommonAuthors", myUDF('source,'target))
    
      newDF.show(false)
    
    +-----------------------+-----------------------+--------------+
    |source                 |target                 |nCommonAuthors|
    +-----------------------+-----------------------+--------------+
    |Author1,Author2,Author3|Author2,Author3,Author1|3             |
    +-----------------------+-----------------------+--------------+
    

提交回复
热议问题