I can\'t quite wrap my brain around the joining process of Kademlia DHTs. I\'ve seen a few tutorials and presentations online, but they all seem to say things the same way and a
I'm assuming you've read the Kademlia paper. Here's an excerpt from my article An Introduction to Kademlia DHT & How It Works
Some background information:
When you have a Kademlia network running, there should always be a node that every other node knows about in order for them to join the network; lets call this the Bootstrap node BN
.
K
is a Kademlia constant that determines the size of the Buckets in a node's routing table as well as the amount of nodes a piece of Data should be stored on.
Joining Process:
A new Node NN
is created with a NodeId (assigned by some method) and an IP Address (the IP of the computer it's hosted on).
NN
sends a LookupRequest(A.NodeId)
to BN
. A Lookup Request basically asks the receiving node for the K-Closest nodes it knows to a given NodeId. In this case, BN
will return the K-Closest nodes it knows to NN
.
BN
will now add NN
to it's routing table, so NN
is now in the network.
NN
receives the list of K-Closest nodes to itself from BN
. NN
adds BN
to it's routing table.
NN
now pings these K nodes received from BN
, and the ones that reply are added to it's Routing Table in the necessary buckets based on distance. By pinging these nodes, they also learn of NN
existence and add NN
to their Routing tables.
NN
is now connected to the network and is known by nodes on the network.
NN
now loops through each of it's K-Buckets
foreach(K-Buckets as KB)
1. NN generates a random NodeId `RNID` // A NodeId that will be in KB
2. NN sends LookupRequest(RNID) to the K-Closest nodes it knows to RNID.
3. The response will be K nodes closest to RNID.
4. NN now fills KB.
NN
does this for each of it's buckets to fill these buckets.
After this operation, NN
has a better idea of the nodes on the network at different distances away from itself.
Note: This step is not mandatory, however I did it in My Implementation of Kademlia so that each node will have better knowledge of the network when they join.
I wrote a full introduction to Kademlia at An Introduction to Kademlia DHT & How It Works
My guess is it uses some super nodes and geospatial informations to compute a minimum spanning tree. It can also compute a voronoi-diagram or the dual delaunay triangulation from the super nodes and use it to run a proximity search. Here is an example: http://www.mathworks.de/de/help/matlab/math/spatial-searching.html.