Plotting the graph in networkx from the numpy array

痴心易碎 提交于 2021-01-29 13:14:07

问题


I have a DataFrame in pandas with information about people location in time. It is about 300+ million rows.

Here is the sample where each Name is assigned to a unique index by group.by and sorted by "Name" and "Year":

import pandas as pd
inp = [{'Name': 'John', 'Year':2018, 'Address':'Beverly hills'}, {'Name': 'John','Year':2018, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Orange county'}, {'Name': 'John', 'Year':2019, 'Address':'NewYork'}, {'Name': 'Steve', 'Year':2018, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2020, 'Address':'California'}, {'Name': 'Steve', 'Year':2020, 'Address':'Canada'}, {'Name': 'John', 'Year':2020, 'Address':'Canada'}, {'Name': 'John', 'Year':2021, 'Address':'Canada'}, {'Name': 'John', 'Year':2021, 'Address':'Beverly hills'}, {'Name': 'Steve', 'Year':2021, 'Address':'California'}, {'Name': 'Steve', 'Year':2022, 'Address':'California'}, {'Name': 'Steve', 'Year':2018, 'Address':'NewYork'}, {'Name': 'Steve', 'Year':2018, 'Address':'California'}, {'Name': 'Steve', 'Year':2022, 'Address':'NewYork'}]
df = pd.DataFrame(inp)
df['Name_Grouped_Index'] = df.groupby(['Name']).ngroup()
df = df.sort_values(['Name', 'Year'], ascending=[False, True])
print (df)

     Name  Year        Address  Name_Grouped_Index
5   Steve  2018         Canada                     1
15  Steve  2018        NewYork                     1
16  Steve  2018     California                     1
6   Steve  2019         Canada                     1
7   Steve  2019         Canada                     1
8   Steve  2020     California                     1
9   Steve  2020         Canada                     1
13  Steve  2021     California                     1
14  Steve  2022     California                     1
17  Steve  2022        NewYork                     1
0    John  2018  Beverly hills                     0
1    John  2018  Beverly hills                     0
2    John  2019  Beverly hills                     0
3    John  2019  Orange county                     0
4    John  2019        NewYork                     0
10   John  2020         Canada                     0
11   John  2021         Canada                     0
12   John  2021  Beverly hills                     0

Thanks to @MarcusRenshaw I am now able to get the network graph matrix (adjacency matrix) in order to see the total of changes between Addresses. In other words, for example, how many times people moved from “Canada” to “California”. The solution for that can be found HERE.

Here is a NumPy Array that I get as the "Network Matrix" from the solution above:

['Canada', 'NewYork', 'California', 'Beverly hills', 'Orange county']
[[2 1 2 1 0]
 [1 0 1 0 0]
 [2 1 1 0 0]
 [0 0 0 2 1]
 [0 1 0 0 0]]

What I want is to plot the Network Matrix NumPy Array with the following characteristics:

  • Directed graph network with arrows (direction) between nodes.
  • A node can have an edge to itself as I have pairs like "Canada-Canada" which is important to show.
  • Node size represents the number of incoming edge/link. More links coming the bigger the node size.
  • edge/link thickness represents the iteration of the change between two nodes (location). Thicker the edge means higher volumes of location change between nodes.

来源:https://stackoverflow.com/questions/61325124/plotting-the-graph-in-networkx-from-the-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!