# Glossary

## Algo: Dijkstra's algorithm

Breadth-first search treats all edges as having the same length. This is rarely true in applications where shortest paths are to be found. For instance, suppose you are driving from San Francisco to Las Vegas, and want to find the quickest route. The figure below shows the major highways you might conceivably use. Picking the right combination of them is a shortest-path problem in which the length of each edge (each stretch of highway) is important. We annotate every edge $e \in E$ with a length $l_e$. If $e = (u, v)$, we will sometimes also write $l(u, v)$ or $l_{uv}$. These $l_e$’s do not have to correspond to physical lengths. They could denote time (driving time between cities) or money (cost of taking a bus), or any other quantity that we would like to conserve.

Breadth-first search finds shortest paths in any graph whose edges have unit length. Can we adapt it to a more general graph $G = (V, E)$ whose edge lengths $l_e$ are positive integers?

#### A more convenient graph

Here is a simple trick for converting $G$ into something BFS can handle: break $G$’s long edges into unit-length pieces, by introducing “dummy” nodes. The figure below shows an example of this transformation. To construct the new graph $G'$,

For any edge $e = (u, v)$ of $E$, replace it by $l_e$ edges of length $1$, by adding $l_e − 1$ dummy nodes between $u$ and $v$.
Graph $G'$ contains all the vertices $V$ that interest us, and the distances between them are exactly the same as in $G$. Most importantly, the edges of $G'$ all have unit length. Therefore, we can compute distances in $G$ by running BFS on $G'$. #### Alarm clocks

If efficiency were not an issue, we could stop here. But when $G$ has very long edges, the $G'$ it engenders is thickly populated with dummy nodes, and the BFS spends most of its time diligently computing distances to these nodes that we don’t care about at all.

To see this more concretely, consider the graphs $G$ and $G'$ of the figure below, and imagine that the BFS, started at node $s$ of $G$, advances by one unit of distance per minute. For the first $99$ minutes it tediously progresses along $S − A$ and $S − B$, an endless desert of dummy nodes. Is there some way we can snooze through these boring phases and have an alarm wake us up whenever something interesting is happening—specifically, whenever one of the real nodes (from the original graph $G$) is reached? We do this by setting two alarms at the outset, one for node $A$, set to go off at time $T = 100$, and one for $B$, at time $T = 200$. These are estimated times of arrival, based upon the edges currently being traversed. We doze off and awake at $T = 100$ to find $A$ has been discovered. At this point, the estimated time of arrival for B is adjusted to $T = 150$ and we change its alarm accordingly.

More generally, at any given moment the breadth-first search is advancing along certain edges of $G$, and there is an alarm for every endpoint node toward which it is moving, set to go off at the estimated time of arrival at that node. Some of these might be overestimates because BFS may later find shortcuts, as a result of future arrivals elsewhere. In the preceding example, a quicker route to $B$ was revealed upon arrival at $A$. However, nothing interesting can possibly happen before an alarm goes off. The sounding of the next alarm must therefore signal the arrival of the wavefront to a real node $u \in V$ by BFS. At that point, BFS might also start advancing along some new edges out of $u$, and alarms need to be set for their endpoints.

The following “alarm clock algorithm” faithfully simulates the execution of BFS on $G'$.

• Set an alarm clock for node $s$ at time $0$.
• Repeat until there are no more alarms:
Say the next alarm goes off at time $T$, for node $u$. Then:
• The distance from $s$ to $u$ is $T$.
• For each neighbor $v$ of $u$ in $G$:
• If there is no alarm yet for $v$, set one for time $T + l(u, v)$.
• If $v$’s alarm is set for later than $T + l(u, v)$, then reset it to this earlier time.

#### Dijkstra’s algorithm

The alarm clock algorithm computes distances in any graph with positive integral edge lengths. It is almost ready for use, except that we need to somehow implement the system of alarms. The right data structure for this job is a priority queue (usually implemented via a heap), which maintains a set of elements (nodes) with associated numeric key values (alarm times) and supports the following operations:

• Insert. Add a new element to the set.
• Decrease-key. Accommodate the decrease in key value of a particular element. (The name decrease-key is standard but is a little misleading: the priority queue typically does not itself change key values. What this procedure really does is to notify the queue that a certain key value has been decreased.)
• Delete-min. Return the element with the smallest key, and remove it from the set.
• Make-queue. Build a priority queue out of the given elements, with the given key values. (In many implementations, this is significantly faster than inserting the elements one by one.)

The first two let us set alarms, and the third tells us which alarm is next to go off. Putting this all together, we get Dijkstra’s algorithm. In the code, $dist(u)$ refers to the current alarm clock setting for node $u$. A value of $\infty$ means the alarm hasn’t so far been set. There is also a special array, $prev$, that holds one crucial piece of information for each node $u$: the identity of the node immediately before it on the shortest path from $s$ to $u$. By following these back-pointers, we can easily reconstruct shortest paths, and so this array is a compact summary of all the paths found. A full example of the algorithm’s operation, along with the final shortest-path tree, is shown below. In summary, we can think of Dijkstra’s algorithm as just BFS, except it uses a priority queue instead of a regular queue, so as to prioritize nodes in a way that takes edge lengths into account. This viewpoint gives a concrete appreciation of how and why the algorithm works, but there is a more direct, more abstract derivation that doesn’t depend upon BFS at all. We now start from scratch with this complementary interpretation.

### An alternative derivation

Here’s a plan for computing shortest paths: expand outward from the starting point $s$, steadily growing the region of the graph to which distances and shortest paths are known. This growth should be orderly, first incorporating the closest nodes and then moving on to those further away. More precisely, when the “known region” is some subset of vertices $R$ that includes $s$, the next addition to it should be the node outside $R$ that is closest to $s$. Let us call this node $v$; the question is: how do we identify it?

To answer, consider $u$, the node just before $v$ in the shortest path from $s$ to $v$: Since we are assuming that all edge lengths are positive, $u$ must be closer to $s$ than $v$ is. This means that $u$ is in $R$—otherwise it would contradict $v$’s status as the closest node to $s$ outside $R$. So, the shortest path from $s$ to $v$ is simply a known shortest path extended by a single edge.

But there will typically be many single-edge extensions of the currently known shortest paths; which of these identifies $v$? The answer is, the shortest of these extended paths. Because, if an even shorter single-edge-extended path existed, this would once more contradict $v$’s status as the node outside $R$ closest to $s$. So, it’s easy to find $v$: it is the node outside $R$ for which the smallest value of $\textrm{distance}(s, u) + l(u, v)$ is attained, as $u$ ranges over $R$. In other words, try all single-edge extensions of the currently known shortest paths, find the shortest such extended path, and proclaim its endpoint to be the next node of $R$.

We now have an algorithm for growing $R$ by looking at extensions of the current set of shortest paths. Some extra efficiency comes from noticing that on any given iteration, the only new extensions are those involving the node most recently added to region $R$. All other extensions will have been assessed previously and do not need to be recomputed. In the following pseudocode, $dist(v)$ is the length of the currently shortest single-edge-extended path leading to $v$; it is $\infty$ for nodes not adjacent to $R$. Incorporating priority queue operations gives us back Dijkstra’s algorithm.

To justify this algorithm formally, we would use a proof by induction, as with breadth-first search. Here’s an appropriate inductive hypothesis.

At the end of each iteration of the while loop, the following conditions hold: (1) there is a value $d$ such that all nodes in $R$ are at distance $\le d$ from $s$ and all nodes outside $R$ are at distance $\ge d$ from $s$, and (2) for every node $u$, the value $dist(u)$ is the length of the shortest path from $s$ to $u$ whose intermediate nodes are constrained to be in $R$ (if no such path exists, the value is $\infty$).
The base case is straightforward (with $d = 0$), and the details of the inductive step can be filled in from the preceding discussion.

### Running time

At the level of abstraction of pseudocode above, Dijkstra’s algorithm is structurally identical to breadth-first search. However, it is slower because the priority queue primitives are computationally more demanding than the constant-time $\tt{eject}$’s and $\tt{inject}$’s of BFS. Since $\tt{makequeue}$ takes at most as long as $|V |$ $\tt{insert}$ operations, we get a total of $|V |$ $\tt{deletemin}$ and $|V | + |E|$ $\tt{insert}/\tt{decreasekey}$ operations. The time needed for these varies by implementation; for instance, a binary heap gives an overall running time of $O((|V | + |E|) \log |V |)$.

#### Which heap is best?

The running time of Dijkstra’s algorithm depends heavily on the priority queue implementation used. Here are the typical choices.

 Implementation $\tt{deletemin}$ $\tt{insert/decreasekey}$ $|V| \times\tt{deletemin}+(|V|+|E|)\times \tt{Insert}$ Array $O(|V|)$ $O(1)$ $O(|V|^2)$ Binary heap $O(\log |V|)$ $O(\log |V|)$ $O((|V|+|E|)\log |V|)$ $d$-ary heap $O(\frac{d\log |V|}{\log d})$ $O(\frac{\log |V|}{\log d})$ $O((|V|\cdot d + |E|)\frac{\log |V|}{\log d})$ Fibonacci heap $O(\log |V|)$ $O(1)$ (amortized) $O(|V|\log|V|+|E|)$

So for instance, even a naive array implementation gives a respectable time complexity of $O(|V |^2 )$, whereas with a binary heap we get $O((|V | + |E|) \log |V |)$. Which is preferable?

This depends on whether the graph is sparse (has few edges) or dense (has lots of them). For all graphs, $|E|$ is less than $|V |^2$ . If it is $\Omega(|V |^2 )$, then clearly the array implementation is the faster. On the other hand, the binary heap becomes preferable as soon as $|E|$ dips below $|V |^2 / \log |V |$.

The $d$-ary heap is a generalization of the binary heap (which corresponds to $d = 2$) and leads to a running time that is a function of $d$. The optimal choice is $d \approx |E|/|V |$; in other words, to optimize we must set the degree of the heap to be equal to the average degree of the graph. This works well for both sparse and dense graphs. For very sparse graphs, in which $|E| = O(|V |)$, the running time is $O(|V | \log |V |)$, as good as with a binary heap. For dense graphs, $|E| = \Omega(|V |2 )$ and the running time is $O(|V |^2 )$, as good as with a linked list. Finally, for graphs with intermediate density $|E| = |V |^{ 1+\delta}$ , the running time is $O(|E|)$, linear!

The last line in the table gives running times using a sophisticated data structure called a Fibonacci heap. Although its efficiency is impressive, this data structure requires considerably more work to implement than the others, and this tends to dampen its appeal in practice. We will say little about it except to mention a curious feature of its time bounds. Its $\tt{insert}$ operations take varying amounts of time but are guaranteed to average $O(1)$ over the course of the algorithm. In such situations we say that the amortized cost of heap $\tt{insert}$’s is $O(1)$.