问题
I am trying to do a pretty simple use case . I have two dataframe -
>>> g.vertices.show(20,False)
+------------------------+
|id |
+------------------------+
|Router_UPDATE_INSERT |
|Seq_Unique_Key |
|Target_New_Insert |
|Target_Existing_Update |
|Target_Existing_Insert |
|SAMPLE_CUSTOMER |
|SAMPLE_CUSTOMER_MASTER |
|Sorter_SAMPLE_CUSTOMER |
|Sorter_CUSTOMER_MASTER |
|Join_Source_Target |
|Exp_DetectChanges |
|Filter_Unchanged_Records|
Details of edges -
>>> g.edges.show(20,False)
+------------------------+----------+------------------------+----------+
|src |From Type |dst |To Type |
+------------------------+----------+------------------------+----------+
|Sorter_SAMPLE_CUSTOMER |sorter |Join_Source_Target |joiner |
|Sorter_CUSTOMER_MASTER |sorter |Join_Source_Target |joiner |
|Join_Source_Target |joiner |Exp_DetectChanges |expression|
|SAMPLE_CUSTOMER |source |Sorter_SAMPLE_CUSTOMER |sorter |
|Router_UPDATE_INSERT |router |Target_Existing_Update |target |
|Seq_Unique_Key |sequencetx|Target_Existing_Insert |target |
|Filter_Unchanged_Records|filter |Router_UPDATE_INSERT |router |
|Exp_DetectChanges |expression|Filter_Unchanged_Records|filter |
|Seq_Unique_Key |sequencetx|Target_New_Insert |target |
|Router_UPDATE_INSERT |router |Seq_Unique_Key |sequencetx|
|SAMPLE_CUSTOMER_MASTER |source |Sorter_CUSTOMER_MASTER |sorter |
+------------------------+----------+------------------------+----------+
g = GraphFrame(vertices, edges)
Now I can find two different lineages -
>>> filteredPaths = g.bfs(
... fromExpr = "id = 'SAMPLE_CUSTOMER_MASTER'",
... toExpr = "id = 'Router_UPDATE_INSERT'",
... edgeFilter = "src != 'joiner1'",
... maxPathLength = 10)
Second lineage -
>>> filteredPaths = g.bfs(
... fromExpr = "id = 'SAMPLE_CUSTOMER'",
... toExpr = "id = 'Router_UPDATE_INSERT'",
... edgeFilter = "src != 'joiner1'",
... maxPathLength = 10)
Two sources are getting merged and split later , all I need is the distinct values with orders getting maintained -
SAMPLE_CUSTOMER
Sorter_SAMPLE_CUSTOMER
SAMPLE_CUSTOMER_MASTER
Sorter_CUSTOMER_MASTER
Join_Source_Target
Exp_DetectChanges
Filter_Unchanged_Records
Router_UPDATE_INSERT
Seq_Unique_Key
Target_New_Insert
Target_Existing_Insert
Target_Existing_Update
来源:https://stackoverflow.com/questions/62455277/spark-graphframe-find-hierarchy