Spark 1.6 SQL or Dataframe or Windows

后端 未结 1 1317
臣服心动
臣服心动 2021-01-26 01:02

I have a data dump of Work orders as below. I need to identify the orders who are all having the status of both \'In Progress\' and \'Finished\'.

Also, need to display d

相关标签:
1条回答
  • 2021-01-26 01:15

    You can use groupBy with collect_list to collect the status list per Work_Req_Id along with a UDF to filter for the wanted statuses. The grouped dataframe is then joined with the original dataframe.

    Window functions aren't being proposed here given that Spark 1.6 doesn't seem to support collect_list/collect_set in window operations.

    val df = Seq(
      ("R1", "John", "3/4/15", "In Progress"),
      ("R1", "George", "3/5/15", "In Progress"),
      ("R2", "Peter", "3/6/15", "In Progress"),
      ("R3", "Alaxender", "3/7/15", "Finished"),
      ("R3", "Alaxender", "3/8/15", "In Progress"),
      ("R4", "Patrick", "3/9/15", "Finished"),
      ("R4", "Patrick", "3/10/15", "Not Valid"),
      ("R5", "Peter", "3/11/15", "Finished"),
      ("R6", "", "3/12/15", "Not Valid"),
      ("R7", "George", "3/13/15", "Not Valid"),
      ("R7", "George", "3/14/15", "In Progress"),
      ("R8", "John", "3/15/15", "Finished"),
      ("R8", "John", "3/16/15", "Failed"),
      ("R9", "Alaxender", "3/17/15", "Finished"),
      ("R9", "John", "3/18/15", "Removed"),
      ("R10", "Patrick", "3/19/15", "In Progress"),
      ("R10", "Patrick", "3/20/15", "Finished"),
      ("R10", "Patrick", "3/21/15", "Hold")
    ).toDF("Work_Req_Id", "Assigned_To", "Date", "Status")
    
    def wanted = udf(
      (statuses: Seq[String]) => statuses.contains("In Progress") &&
        (statuses.contains("Finished") || statuses.contains("Not Valid"))
    )
    
    val df2 = df.groupBy($"Work_Req_Id").agg(collect_list($"Status").as("Statuses")).
      where( wanted($"Statuses") ).
      drop($"Statuses")
    
    df.join(df2, Seq("Work_Req_Id")).show
    
    // +-----------+-----------+-------+-----------+
    // |Work_Req_Id|Assigned_To|   Date|     Status|
    // +-----------+-----------+-------+-----------+
    // |         R3|  Alaxender| 3/7/15|   Finished|
    // |         R3|  Alaxender| 3/8/15|In Progress|
    // |         R7|     George|3/13/15|  Not Valid|
    // |         R7|     George|3/14/15|In Progress|
    // |        R10|    Patrick|3/19/15|In Progress|
    // |        R10|    Patrick|3/20/15|   Finished|
    // |        R10|    Patrick|3/21/15|       Hold|
    // +-----------+-----------+-------+-----------+
    
    0 讨论(0)
提交回复
热议问题