Spark graphframe find hierarchy

本小妞迷上赌 提交于 2021-01-29 10:34:20

问题


I am trying to do a pretty simple use case . I have two dataframe -

>>> g.vertices.show(20,False)
+------------------------+
|id                      |
+------------------------+
|Router_UPDATE_INSERT    |
|Seq_Unique_Key          |
|Target_New_Insert       |
|Target_Existing_Update  |
|Target_Existing_Insert  |
|SAMPLE_CUSTOMER         |
|SAMPLE_CUSTOMER_MASTER  |
|Sorter_SAMPLE_CUSTOMER  |
|Sorter_CUSTOMER_MASTER  |
|Join_Source_Target      |
|Exp_DetectChanges       |
|Filter_Unchanged_Records|

Details of edges -

>>> g.edges.show(20,False)
+------------------------+----------+------------------------+----------+
|src                     |From Type |dst                     |To Type   |
+------------------------+----------+------------------------+----------+
|Sorter_SAMPLE_CUSTOMER  |sorter    |Join_Source_Target      |joiner    |
|Sorter_CUSTOMER_MASTER  |sorter    |Join_Source_Target      |joiner    |
|Join_Source_Target      |joiner    |Exp_DetectChanges       |expression|
|SAMPLE_CUSTOMER         |source    |Sorter_SAMPLE_CUSTOMER  |sorter    |
|Router_UPDATE_INSERT    |router    |Target_Existing_Update  |target    |
|Seq_Unique_Key          |sequencetx|Target_Existing_Insert  |target    |
|Filter_Unchanged_Records|filter    |Router_UPDATE_INSERT    |router    |
|Exp_DetectChanges       |expression|Filter_Unchanged_Records|filter    |
|Seq_Unique_Key          |sequencetx|Target_New_Insert       |target    |
|Router_UPDATE_INSERT    |router    |Seq_Unique_Key          |sequencetx|
|SAMPLE_CUSTOMER_MASTER  |source    |Sorter_CUSTOMER_MASTER  |sorter    |
+------------------------+----------+------------------------+----------+

g = GraphFrame(vertices, edges)

Now I can find two different lineages -

>>> filteredPaths = g.bfs(
...   fromExpr = "id = 'SAMPLE_CUSTOMER_MASTER'",
...   toExpr = "id = 'Router_UPDATE_INSERT'",
...   edgeFilter = "src != 'joiner1'",
...   maxPathLength = 10)

Second lineage -

>>> filteredPaths = g.bfs(
...   fromExpr = "id = 'SAMPLE_CUSTOMER'",
...   toExpr = "id = 'Router_UPDATE_INSERT'",
...   edgeFilter = "src != 'joiner1'",
...   maxPathLength = 10)

Two sources are getting merged and split later , all I need is the distinct values with orders getting maintained -

SAMPLE_CUSTOMER
Sorter_SAMPLE_CUSTOMER
SAMPLE_CUSTOMER_MASTER
Sorter_CUSTOMER_MASTER
Join_Source_Target
Exp_DetectChanges
Filter_Unchanged_Records
Router_UPDATE_INSERT
Seq_Unique_Key
Target_New_Insert
Target_Existing_Insert
Target_Existing_Update

来源:https://stackoverflow.com/questions/62455277/spark-graphframe-find-hierarchy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!