Plotting the graph in networkx from the numpy array

问题

I have a DataFrame in pandas with information about people location in time. It is about 300+ million rows.

Here is the sample where each Name is assigned to a unique index by group.by and sorted by "Name" and "Year":

import pandas as pd
inp = [{'Name': 'John', 'Year':2018, 'Address':'Beverly hills'}, {'Name': 'John','Year':2018, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Beverly hills'}, {'Name': 'John', 'Year':2019, 'Address':'Orange county'}, {'Name': 'John', 'Year':2019, 'Address':'NewYork'}, {'Name': 'Steve', 'Year':2018, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2019, 'Address':'Canada'}, {'Name': 'Steve', 'Year':2020, 'Address':'California'}, {'Name': 'Steve', 'Year':2020, 'Address':'Canada'}, {'Name': 'John', 'Year':2020, 'Address':'Canada'}, {'Name': 'John', 'Year':2021, 'Address':'Canada'}, {'Name': 'John', 'Year':2021, 'Address':'Beverly hills'}, {'Name': 'Steve', 'Year':2021, 'Address':'California'}, {'Name': 'Steve', 'Year':2022, 'Address':'California'}, {'Name': 'Steve', 'Year':2018, 'Address':'NewYork'}, {'Name': 'Steve', 'Year':2018, 'Address':'California'}, {'Name': 'Steve', 'Year':2022, 'Address':'NewYork'}]
df = pd.DataFrame(inp)
df['Name_Grouped_Index'] = df.groupby(['Name']).ngroup()
df = df.sort_values(['Name', 'Year'], ascending=[False, True])
print (df)

     Name  Year        Address  Name_Grouped_Index
5   Steve  2018         Canada                     1
15  Steve  2018        NewYork                     1
16  Steve  2018     California                     1
6   Steve  2019         Canada                     1
7   Steve  2019         Canada                     1
8   Steve  2020     California                     1
9   Steve  2020         Canada                     1
13  Steve  2021     California                     1
14  Steve  2022     California                     1
17  Steve  2022        NewYork                     1
0    John  2018  Beverly hills                     0
1    John  2018  Beverly hills                     0
2    John  2019  Beverly hills                     0
3    John  2019  Orange county                     0
4    John  2019        NewYork                     0
10   John  2020         Canada                     0
11   John  2021         Canada                     0
12   John  2021  Beverly hills                     0

Thanks to @MarcusRenshaw I am now able to get the network graph matrix (adjacency matrix) in order to see the total of changes between Addresses. In other words, for example, how many times people moved from “Canada” to “California”. The solution for that can be found HERE.

Here is a NumPy Array that I get as the "Network Matrix" from the solution above:

['Canada', 'NewYork', 'California', 'Beverly hills', 'Orange county']
[[2 1 2 1 0]
 [1 0 1 0 0]
 [2 1 1 0 0]
 [0 0 0 2 1]
 [0 1 0 0 0]]

What I want is to plot the Network Matrix NumPy Array with the following characteristics:

Directed graph network with arrows (direction) between nodes.
A node can have an edge to itself as I have pairs like "Canada-Canada" which is important to show.
Node size represents the number of incoming edge/link. More links coming the bigger the node size.
edge/link thickness represents the iteration of the change between two nodes (location). Thicker the edge means higher volumes of location change between nodes.

来源：https://stackoverflow.com/questions/61325124/plotting-the-graph-in-networkx-from-the-numpy-array

标签

numpy

plot

graph

networkx

adjacency-matrix