Peer Routing Table
A peer in a P2P network needs to be able to communicate with the other peers in the network. In order to do so, it must first be able to find the other peers in the network. To find the other peers in the network, each peer has a routing table. The routing table contains references to a number of peers in the network.
In a P2P network with millions of peers, each peer cannot hold a complete table of all other peers in the network. Such a table would take up a lot of resources on each peer, and would also be almost impossible to keep up to date, as peers join and leave the network.
Instead a peer routing table contains references to a subset of peers in the network. Exactly how the routing table is organized depends on the P2P network algorithm. In this text I will only show you how Chord and Kademlia organizes their routing tables, as I find these systems to be more a little more elegant than Pastry and Tapestry.
Actually there is very little difference between these systems, and their routing tables. The overall principles are very similar. The differences are more in the interpretation of the GUID, and their so called "GUID distance functions".
Exponential GUID Distance Referencing
As mentioned above each peer only references a subset of the total peers in its routing table. A reference to a peer consists of the peers GUID and IP address and port number, so the peer can eventually be contacted. More formally, the elements in a peer routing table are data elements of this type:
(GUID, IP Address, TCP Port)
A peer chooses which other peers to reference based on the distance between its own GUID, and the GUID of the peer that is being referenced.
The routing table has the same number of cells as the number of bits in the GUID. Thus, if the GUID is 64 bit, then there are 64 cells in the routing table. Thus, a 64 bit "GUID space" has space for 264 GUID's (peers) total, and each peer needs a 64 cell routing table.
Each cell in the routing table contains a reference to a GUID that is 2N distance away, where N runs from 0 to the number of bits in the GUID. The IP address and port number of the peer owning that GUID is also stored, so the peer with a given GUID can eventually be contacted.
For instance, if distance was defined as substracting the numerical value of one GUID from the other, and the GUID's where 4 bit (= max 16 peers), then the peer with GUID 0 would have the following routing table:
|A Peer Routing Table for Peer with GUID 0 (4 bit GUID size)|
|Cell Index||Referenced GUID||GUID Distance|
In case the peer having the exact distance of 2N is not online in the network, the closest GUID above the distance of 2N is kept instead in the N'th cell. For instance, if the peer with GUID 8 was not online, then the cell with index 3, matching a distance of 23 = 8, would instead contain a reference to peer 9. If peer 9 was not online, then to the peer with GUID 10 etc.
The above routing table is illustrated below, here:
|A peer routing table with references to other nodes in the P2P network.|
Imagine that each peer in the network has a similar routing table. For instance, this diagram shows both peer 0 and peer 9's routing tables:
|Two peer routing tables.|
You might wonder how the distance between two GUID's is calculated. The distance between two GUID's is a numerical calculation. It is not based on geographical location. Thus, two peers with a logical GUID distance of 1 could be located greographically on different continents, for instance New York and Tokyo.
Chord and Kademlia uses two different distance functions. I'll explain both below.
Chord's Distance Function
Chord calculates the distance between two peers by subtracting the GUID's numerically from each other. The GUID space is defined to "wrap", meaning that in a 4-bit GUID space (values 0 to 15), the distance from 15 to 0 is defined as 1. The distance from 15 to 1 is 2 etc. The distance from 0 to 15 is 15, however. As you can see the distance from A to B is not the same as the distance from B to A.
The Chord distance function can be defined a bit more formally like this:
distance(A, B) = B - A (for B >= A) = B - A + 2N (for B < A)
This can also be calculated as
distance(A,B) = ( B - A + 2N) mod 2N
distance(0, 11) = ( 11 - 0 + 16 ) mod 16 = (27) mod 16 = 11 distance(11, 0) = ( 0 - 11 + 16 ) mod 16 = (5) mod 16 = 5
As you can see,
distance(A, B) is not equal to
Kademlia's Distance Function
Kademlia uses XOR as distance function. Thus, the two GUID's are simply XOR'ed together, and the result is the distance between the GUID's.
Here is a more formal definition:
distance(A, B) = A XOR B
0000 (0) XOR 1111 (15) = 1111 (15) 0110 (6) XOR 1010 (10) = 1100 (12)
XOR has the nice characteristic, that no two different numbers XOR'ed to a given number A, can give the same result. In other words, no two different numbers XOR'ed to e.g. 6 can give the same result. All numbers XOR'ed to 6 will give a different result. Thus, XOR is just as useful as subtraction of GUID's when it comes to calculating unique distances between GUID's.
XOR also has another interesting characteristic:
A XOR B = B XOR A
That means, that the distance from A to B is the same as the distance from B to A. This is not true for subtraction. A - B is not equal to B - A.
Comparing Chord and Kademlia Routing Tables
Just out of interest I will show you how the complete routing tables of peers in a 4-bit GUID space P2P network would look, using either Chord or Kademlia to calculate distances.
Each horizontal row in the tables below show first the GUID of the referencing peer in bold, and then the contents of its routing table.
The diagrams below illustrate the routing tables of peers with GUID's 0, 4, 8 and 12 using both Chord and Kademlia. Some of the reference arrows are layered ontop of each other, because the peers point to each other. If you look for the arrowheads you should be able to see whether a reference is one way, or both ways.
|Peer routing tables in a Chord P2P network.|
|Peer routing tables in a Kademlia P2P network.|
Notice the reference symmetry in the Kademlia diagram. If a peer A points to peer B, then B also points to peer A. This is not always so with Chord. This symmetry makes it possible to depict the the reference between two peers as a simple line, instead of an arrow with heads on, because there will always exist a reference in both directions.
The reference symmetry is more obvious in the following two diagrams which shows routing table references from peers with GUID's 0, 1, 2, and 3 - the first quarter of the ring. The Kademlia references are now shown just as lines, not arrows. The Chord references still have arrows, as reference direction still matters for Choord routing tables.
|Peer routing tables in a Chord P2P network.|
|Peer routing tables in a Kademlia P2P network.|
Kademlia's distance function has a few advantages over Choord's:
- Kademlia's XOR distance is easier to calculate.
- Kademlia's routing tables makes routing table management a bit easier.
Kademlias distance function (XOR) is easier to calculate than Chords distance function (subtraction, addition and modulo). This is especially true if the GUID cannot fit into a single instance of a data type in your programming language (a single long, int, short, byte etc).
For instance, if a GUID is 128 bit long, in Java you would
have to use 2 long variables to store it. Now you would have to subtract 2 x 2 longs from
each other, move underflow from the larger of the two longs to the smaller, and when adding
2N to the result, you would actually need an extra long to keep the result
until you perform the
mod operation on it.
It is much easier to just use a long (or byte) array for the GUID, and just XOR each pair of longs (or bytes) fromt the two arrays with each other.
Routing Table Management
When a peer joins a P2P network it needs to build it's own routing table as one of the first actions. Second, it need to notify all the peers in the network that ought to reference it, so they can update their routing tables too.
In a Kademlia P2P network the peers the joining peer need to reference, and the peers it need to notify of it joining, are the same.
In a Choord P2P network the peers the joining peer need to reference, and the peers it need to notify of it joining, are not the same. Both sets of peers will have to be located and notified separately. Thus, joining a Kademlia network is easier than joining a Choord network.
The same is true if a peer decides to leave a P2P network. A leaving peer should notify all peers that has a reference to it. Thus, these peers can remove the leaving peer from their routing table, and start searching for replacements.
In a Kademlia P2P network the leaving peer's routing table contains a full list of all the peers that hold a reference to it (if the routing table is 100% up-to-date). Thus, the leaving peer can just iterate it's own routing table and send a "leave" message to all these peers.
In a Choord P2P network the leaving peer's routing table does not contain a full list of all the peers that have a reference to itself. Instead, the leaving will have to locate all these peers one by one, and send them a "leave" message. This one-by-on lookup process is much slower than Kademlias direct message process.
I will get into more detail about routing table management in later texts.