Networkx - How to get shortest path length between nodes showing node id instead of label

I'm new to using NetworkX library with Python.

Let's say that I import a Pajek-formatted file:

import networkx as nx

The contents of my file are (In Pajek, nodes are called "Vertices"):

*Vertices 6
123 Author1
456 Author2
789 Author3
111 Author4
222 Author5
333 Author6
123 333
333 789
789 222
222 111
111 456

Now, I want to calculate all the shortest path lengths between the nodes in my network, and I'm using this function, per the library documentation

path = nx.all_pairs_shortest_path_length(G)

Returns: lengths – Dictionary of shortest path lengths keyed by source and target.

The return I'm getting:

print path
{u'Author4': {u'Author4': 0, u'Author5': 1, u'Author6': 3, u'Author1': 4, u'Author2': 1, u'Author3': 2}, u'Author5': {u'Author4': 1, u'Author5': 0, u'Author6': 2, u'Author1': 3, u'Author2': 2, u'Author3': 1}, u'Author6': {u'Author4': 3, u'Author5': 2, u'Author6': 0, u'Author1': 1, u'Author2': 4, u'Author3': 1}, u'Author1': {u'Author4': 4, u'Author5': 3, u'Author6': 1, u'Author1': 0, u'Author2': 5, u'Author3': 2}, u'Author2': {u'Author4': 1, u'Author5': 2, u'Author6': 4, u'Author1': 5, u'Author2': 0, u'Author3': 3}, u'Author3': {u'Author4': 2, u'Author5': 1, u'Author6': 1, u'Author1': 2, u'Author2': 3, u'Author3': 0}}

As you can see, it's really hard to read, and to put to a later use...

Ideally, what I'd like is a return with a format similar to the below:

source_node_id, target_node_id, path_length
123, 456, 5
123, 789, 2
123, 111, 4

In short, I need to get a return using only (or at least including) the nodes ids, instead of just showing the node labels. And, to get every possible pair in a single line with their corresponding shortest path...

Is this possible in NetworkX?

How about something like this?

import networkx as nx                                                            
# first get all the lengths      
path_lengths = nx.all_pairs_shortest_path_length(G)                              

# now iterate over all pairs of nodes      
for src in G.nodes():
    # look up the id as desired                           
    id_src = G.node[src].get('id')
    for dest in G.nodes():                                                       
        if src != dest: # ignore self-self paths
            id_dest =  G.node[dest].get('id')                                    
            l = path_lengths.get(src).get(dest)                                  
            print "{}, {}, {}".format(id_src, id_dest, l) 

This yields an output

111, 222, 1
111, 333, 3
111, 123, 4
111, 456, 1
111, 789, 2

If you need to do further processing (e.g. sorting) then store the l values rather than just printing them.

(you could loop through pairs more cleanly with something like itertools.combinations(G.nodes(), 2) but the method above is a bit more explicit in case you aren't familiar with it.)


In the end, I only needed to calculate the shortest path for a subset of the whole network (my actual network is huge, with 600K nodes and 6M edges), so I wrote a script that reads source node and target node pairs from a CSV file, stores to a numpy array, then passes them as parameters to nx.shortest_path_length and calculates for every pair, and finally saves the results to a CSV file.

The code is below, I'm posting it just in case it can be useful for someone out there:

print "Importing libraries..."

import networkx as nx
import csv
import numpy as np

#Import network in Pajek format .net

print "Finished importing Network Pajek file"

#Simplify graph into networkx format

print "Finished converting to Networkx format"

#Network info
print "Nodes found: ",G.number_of_nodes()
print "Edges found: ",G.number_of_edges()

#Reading file and storing to array
with open('paired_nodes.csv','rb') as csvfile:
    reader = csv.reader(csvfile, delimiter = ',', quoting=csv.QUOTE_MINIMAL)#, quotechar = '"')
    data = [data for data in reader]
paired_nodes = np.asarray(data)

print "Finished reading paired nodes file"

#Add extra column in array to store shortest path value
paired_nodes = np.append(paired_nodes,np.zeros([len(paired_nodes),1],,1)

print "Just appended new column to paired nodes array"

#Get shortest path for every pair of nodes

for index in range(len(paired_nodes)):
        #print shortest
        paired_nodes[index,2] = shortest
    except nx.NetworkXNoPath:
        #print '99999'  #Value to print when no path is found
        paired_nodes[index,2] = 99999

print "Finished calculating shortest path for paired nodes"

#Store results to csv file      
f = open('shortest_path_results.csv','w')

for item in paired_nodes:

print "Done writing file with results, bye!"

