Tangible Computing
18. Graph Theory




18.1 Basic Notion of a Graph

We are now going to briefly introduce you to a very rich area of combinatorial mathematics called graph theory. Graph theory is the single most important tool we have in computing for modelling problems.

A graph is a collection of vertices (also called nodes) and edges. Any two vertices can be connected by an edge. Graphs are used to model a variety of problems: communication networks, decision trees, scheduling problems, data structures, and so on.

There are two main kinds of graphs. An undirected graph, commonly just called a graph, has is no sense of direction on edges, and edge x--y is just between vertices x and y. A directed graph, or digraph, has a direction associated with an edge. So the edge x -> y goes from vertex x to vertex y. Here is a graph:


18.2 Graph Notions

A walk in a graph is a sequence of adjacent edges, with edges possibly appearing multiple times.

A path in a graph is a walk in which each edge only appears once.

The neighbours of a vertex v in a graph are all the vertices that have an edge directly to v.

A graph is connected is there is a path between any pair of distinct verticies.

A tree is a connected graph in which there is exactly one path between every pair of distinct vertices.

A subgraph of a graph is a set of vertices and edges chosen from the vertices and edges of a existing graph. More precisely, a graph H=(V', E') is a subgraph of a graph G=(V,E) is V' is a subset of V, and E' is a subset of E. Note that all the vertices associated with E' must also be present in V'.

For more terms see this glossary of graph theory.

18.3 Exercise: An Example of A Graph Algorithm

This is an exercise to do live in class.

  1. Begin by seeing if Python can support the mathematical definition of a graph as G=(V, E), where V is a set of vertices, and E is a set of edges, where each edge e is a set e = {x, y} where x, y are from V, and x != y.

    Can you have a set of sets? If not what can you use to represent a set of 2 elements?

    Note that given G you can break the tuple G apart using Python's tuple assignment (V, E) = G

  2. How do we print a graph G?

  3. Let's see if we can write an algorithm that given a starting vertex and a stopping vertex finds a walk in G from start to stop. Idea: randomly pick an edge to go to your neighbour and wander around aimlessly until you find the stop vertex.

  4. How do you pick random start and stop points?
    import random
    random.choice(...)
    random.sample(...)

  5. How do you find the neighbours of a vertex v? Write a neighbours_of(G, v) function. What is the time complexity of this? Could you speed it up with caching/memoization?

  6. Write the loop to walk around the graph until you hit the stop vertex. What can go wrong?

  7. How do we handle exceptions? try ... except. You should almost never catch every exception!

  8. Let's test it on a bigger graph. How can we generate a random graph with n vertices and m edges? Does the way we represent tuples cause a problem?

  9. Suppose you have a walk from start to stop. Can you remove redundant edges and turn it into a path?

  10. Our code is ugly, especially the hack we need to use for running tests on the function we have for working with the graph. Maybe we should put this into a different file: graph.py which then gives us a module.

  11. To use a module we have a variety of possibilities:
    import graph

    from graph import *

    from graph import neighbours_of

  12. We should make this collection of functions into a Graph class. This will be necessary if we want to be efficient. For example, removing an edge would be more than simply removing it from E, we might have to invalidate the neighbours cache, or at least adjust it.
Now, using our graph.py module we just produced, let's do some standard algorithms on graphs. We really need a way to draw graphs!

18.4 Spanning Trees

A spanning tree of a graph G=(V,E) is a subgraph of G that contains all the vertices of G, and is also a tree.

If there are n vertices in G, how many edges are there in the tree?

How do you find a spanning tree of a graph? There are two main strategies.

In depth-first search, the idea is to explore the graph by starting at an initial vertex (called the root) and going as far as possible before backing up and trying a new direction. This is the kind of strategy that one uses when explores a maze. The resulting spanning tree tends to look long and stringy.

code/GraphTheory/SpanningTrees/dfs-spanning-tree.txt

    # Pseudo code for constructing a spanning tree using depth-first search
     
    empty spanning tree T
    mark root as visited
    push root on visit stack
    while (stack not empty) {
       v = top of stack
     
       find unvisited neighbor w of v
       if (w exists) {
           mark w as visited
           add edge [ v, w ] to T
           push w on visit stack
           } 
        else {
           # no unvisted neighbors of top of stack
           pop top of stack
           }
        }


The other strategy, breadth-first search, is to explore all of the immediate neighbours of your position before returning to them to explore their neighbours. This strategy requires quite a bit more work if you were actually walking around in a maze.

code/GraphTheory/SpanningTrees/bfs-spanning-tree.txt

    # Pseudo code for constructing a spanning tree using breadth-first search
     
    empty spanning tree T
    mark root as visited
    add root to visit queue
    while (queue not empty) {
       v = head of queue
     
       find unvisited neighbor w of v
       if (w exists) {
           mark w as visited
           add edge [ v, w ] to T
           add w to visit queue
           }
        else {
           # no unvisited neighbours of head of queue
           remove head of queue
           }
        }


18.5 Min-Cost Spanning Trees

The previous spanning trees are simply used to connect all of the nodes in the graph (assuming they can be connected), without any particular measure of how good they are. We could, for example, compare tress on the basis of which tree has the longest path between any two vertices (long stringy will in general have longer paths than short broad trees).

If we attach costs to the edges of a graph, then we can compare trees on the basis of the total cost of all the edges in the tree. Then the spanning tree with the smaller cost is 'better' than the one with the larger cost.

There are two main algorithms for this problem. In both algorithms, the spanning tree is contstructed by adding one edge at a time. The choice of edge is driven by cost, with the lowest cost allowable edge being chosen at each step. This is the greedy approach.

Prim's min cost spanning tree algorithm is as follows:

code/GraphTheory/SpanningTrees/prim-min-cost-tree.txt

    Prim's algorithm for constructing a min cost spanning tree.
     
    empty minimum spanning tree T
    choose start vertex v
    add v to T
    while (there is a vertex not in T) {
       choose least cost edge e from some vertex in T to some vertex not in T
       add edge e to T
    }
     


Kruskal's min cost spanning tree algorithm is this:

code/GraphTheory/SpanningTrees/kruskal-min-cost-tree.txt

    Various forms of Kruskal's min cost spanning tree algorithm.
     
    Kruskal 1:
     
    empty minimum spanning tree T
    foreach edge e in (sorted edges by cost) {
       if (adding edge e to T does not create a cycle) {
           add edge e to T
       }
    }
     
    Kruskal 2:
     
    empty minimum spanning tree T
    foreach edge e in (sorted edges by cost) {
       if (edge e = [ u, v ] has u and v in disjoint components of T) {
           add edge e to T
       }
    }
     
    Kruskal 3:
     
    empty minimum spanning tree T
    put each vertex v into their own partition
    foreach edge e in (sorted edges by cost) {
       consider edge e = [ u, v ]
       if u and v are in different partitions, add edge and merge the partitions
       if (Find(u) != Find(v)) {
           add edge e to T
           Union( u, v )
       }
    }
     


Kruskal's algorithm requires a fast way to see if two vertices are in different sets, and then merging their sets. This is the so-called Union-Find problem.

18.6 Shortest Paths

Suppose that we have two vertices u and v in a graph, and we want to find a path between them tha contains the smallest number of edges. We can find such a path by starting at vertex u and then doing a breadth-first search until we encounter vertex v.

18.7 Min-Cost Paths

If we put weights (or costs) on the edges, then a breadth-first search is not going to work. In this new situation, we need to build up information about the best (lowest cost) ways of getting between vertices and then composing that information. One way to attack the problem is to consider what the least cost between u and v is using a path of l or less edges. Initially, l is 0, and then increases until l is the number of vertices -1. To go from l to l+1 is to consider how to find a path one edge longer that might have a lower cost.

code/GraphTheory/SpanningTrees/floyd-warshall-min-cost-path.txt

    Floyd/Warshall algorithm to compute the min cost path between any two
    vertices.
     
    n is the number of vertices
     
    C is the cost matrix, where C[i, j] is the cost of taking the edge from
    vertex i to vertex j.  If there is no edge the cost is infinite.  Costs
    are always non-negative.  C[i, i] = 0;
     
    M[l, i, j] is the min cost of going from i to j using l or fewer edges.
     
    Note that 
    a) M[0, i, i] = 0, and for i != j, M[0, i, j] = infinity
    b) M[1, i, j] = C[i, j]
    c) for l >= n  then M[l-1, i, j] = M[l, i, j] since you cannot have a 
    path of more than n-1 edges in the graph.
     
    So M[n, i, j] is the min cost of a path from vertex i to vertex j
    in a graph with n vertices.
     
    Then the relationship between M for increasing l is as follows:
     
    M[l+1, i, j] = Min over k of M[l, i, k] + C[k, j]
     
    Computing this in the naive way is O(n^4), and needs two n x n
    matrices
     
        for l in range(2, n):
            for i in range(n):
                for j in range(n):
                    M[l+1, i, j] = 
                        min over k of M[l, i, k] + C[k, j]
     
    Note C[i, j] = M[1, i, j] so we can rewrite the min as:
     
            M[l+1, i, j] = min over k of M[l, i, k] + M[1, k, j]
     
    This suggests that M is computed using a kind of matrix multiplication
    where instead of using the (+, *) inner product, we use the (min, +) 
    inner product:
     
        M[n] = M[1] x M[1] x ... x M[1]
     
    So we could use the recursive doubling trick from the Diffie-Hellman
    key exchange to comput M[n] by log n squarings of M[1].
     
        for l in range(log(n))
            for i in range(n):
                for j in range(n):
                    M[l+l, i, j] = 
                        min over k of M[l, i, k] + M[l, k, j]
     
    Which is of time complexity O(n^3 log n)
     
    This is still not particularly good.  Consider a graph of 1000 vertices.`


There is quite a bit of unnecessary computation in this approach. For example, two paths p1 and p2 that you want to concatenate might contain the same vertex. In which case there is no way that p1 followed by p2 can be a lower cost path. So we can restructure the computation and instead consider paths of both increasing numbers of edges, and of not previously considered vertices.

This yields the Floyd-Warshal all-pairs min-cost path algorithm:

code/GraphTheory/SpanningTrees/floyd-warshall-min-cost-path-2.txt

    Define A[k,i,j] = the cost of going from vertex i to j using possibly
    intermediate vertices 0 upto k.
     
    Then A[n-1,i,j] = min cost of a path from vertex i to j.
     
    Consider A[k,i,j].
     
    The least cost path between i and j either includes k or it doesn't
    include k.  If it includes k then A[k,i,j] = A[k-1,i,k] + A[k-1,k,j].
    If it does not include k then A[k,i,j] = A[k-1,i,j].  So,
     
    A[k,i,j] = min ( A[k-1,i,j], A[k-1,i,k] + A[k-1,k,j] )
     
                   k
                 ~   ~                 
    A[k-1,i,k] ~       ~  A[k-1,k,j]
             ~           ~
           i ~~~~~~~~~~~~ j
              A[k-1,i,j]
     
    We can define the base case A[-1,i,j] = C[i,j], that is, the cost of
    going directly from i to j (i.e. not passing through any intermediate
    vertex).
     
    Now, unlike the previous algorithm, where one needs two matricies to
    store the current best cost for paths of length l edges and the new
    updated cost for paths of length l+1, this requires only one matrix A.
     
    The key is observing that 
            A[k, i, j] <= A[k-1, i, j], 
    that is, the cost can only get lower as you increase k.
     
    Also, A[k,i,k] = A[k-1,i,k], A[k,k,j] = A[k-1,k,j].
    That is, the kth version of A does not change entries where i=k or j=k.
     
    So, rather than iterating over k in the inner loop, we move k to the
    outer loop and add new vertices (k) to consideration for each cost
    matrix.
     
    Now from the diagram above, for each new k we have
     
        A[k, i, j] = min ( A[k-1, i, j], A[k-1, i, k] + A[k-1, k, j] )
     
    Now this expression has useful properties for i=j, i=k, or k=j
    since A[k-1, i, i] is 0 and A[k-1, k, k] = 0.
     
    This means that updating the [i, j] entry is always independent of
    the values of the [i,k] and [k,j] entries.  So you can actually do
    the update in place.
     
    So our loops go:
     
    for k in range(n)
      for i in range(n)
         for j in range(n)
            A[i, j] = min ( A[i, j], A[i, k] + A[k, j] )
     
    which O(n^3) time and O(n^2) space.  This is still not a big
    improvement.


Floyd-Warshal adds the intermediate vertex k in a specific order as determined by the vetex numbers, and independent of the actual cost structure of the graph. Suppose that rather than considering the next k in sequence, we consider the vertex that has the best improvement in cost. Also, it computes the least cost path between all pairs of vertices. But usually we only need to find a least cost path between two specific vertices. Perhaps this, combined with a clever strategy for identifying the next vertex k to consider can help us reduce the running time futher.

These ideas lead to Dijkstra's Algorithm, which has running time O( m + n log n), where m is the number of edges in the graph.

18.8 The Digraph Class

Our first assignment of this term involves finding the shortest paths between two points using Edmonton's road network. Since roads can be one-way, a graph is not a good representation. We need one where the edges are oriented with a direction. Thus edges are not sets {x, y}, but instead are ordered pairs (tuples) of the form (x, y) denoting that the edge goes from x to y.

Interestingly, because we used tuples in the graph class above, we do not need to modify our Graph class very much in order to change it into a Digraph class.

Here is one possible implementation of the Digraph class in all its details:

code/GraphTheory/Digraph/digraph.py

    """
    Graph module for undirected graphs.
    """
     
    import random
     
    try:
        import display
    except:
        print("Warning: failed to load display module.  Graph drawing will not work.")
        
    class Digraph:
        """
        Directed graph.  The vertices must be immutable.
     
        To create an empty graph:
        >>> G = Digraph()
        >>> (G.num_vertices(), G.num_edges())
        (0, 0)
     
        To create a circular graph with 3 vertices:
        >>> G = Digraph([(1, 2), (2, 3), (3, 1)])
        >>> (G.num_vertices(), G.num_edges())
        (3, 3)
        """
     
        def __init__(self, edges = None):
            self._tosets = {}
            self._fromsets = {}
     
            if edges:
                for e in edges: self.add_edge(e)
     
        def __repr__(self):
            return "Digraph({}, {})".format(self.vertices(), self.edges())
     
        def add_vertex(self, v):
            """
            Adds a vertex to the graph.  It starts with no edges.
            
            >>> G = Digraph()
            >>> G.add_vertex(1)
            >>> G.vertices() == {1}
            True
            """
            if v not in self._tosets:
                self._tosets[v] = set()
                self._fromsets[v] = set()
     
        def add_edge(self, e):
            """
            Adds an edge to graph.  If vertices in the edge do not exist, it adds them.
            
            >>> G = Digraph()
            >>> G.add_vertex(1)
            >>> G.add_vertex(2)
            >>> G.add_edge((1, 2))
            >>> G.add_edge((2, 1))
            >>> G.add_edge((3, 4))
            >>> G.add_edge((1, 2))
            >>> G.num_edges()
            3
            >>> G.num_vertices()
            4
            """
            # Adds the vertices (in case they don't already exist)
            for v in e:
                self.add_vertex(v)
     
            # Add the edge
            self._tosets[e[0]].add(e[1])
            self._fromsets[e[1]].add(e[0])
     
        def edges(self):
            """
            Returns the set of edges in the graph as ordered tuples.
            """
            return { (v, w) for v in self._tosets for w in self._tosets[v] }
     
        def vertices(self):
            """
            Returns the set of vertices in the graph.
            """
            return set(self._tosets.keys())
     
        def draw(self, filename, attr = {}):
            """
            Draws the graph into a dot file.
            """
            display.write_dot_desc((self.vertices(), self.eges()), filename, attr)
     
        def num_edges(self):
            m = 0
            for v in self._tosets:
                m += len(self._tosets[v])
            return m
     
        def num_vertices(self):
            """
            Returns the number of vertices in the graph.
            """
            return len(self._tosets)
     
        def adj_to(self, v):
            """
            Returns the set of vertices that contain an edge from v.
     
            >>> G = Digraph()
            >>> for v in [1, 2, 3]: G.add_vertex(v)
            >>> G.add_edge((1, 3))
            >>> G.add_edge((1, 2))
            >>> G.adj_to(3) == set()
            True
            >>> G.adj_to(1) == { 2, 3 }
            True
            """
            return self._tosets[v]
     
        def adj_from(self, v):
            """
            Returns the set of vertices that contain an edge to v.
     
            >>> G = Digraph()
            >>> G.add_edge((1, 3))
            >>> G.add_edge((2, 3))
            >>> G.adj_from(1) == set()
            True
            >>> G.adj_from(3) == { 1, 2 }
            True
            """
            return self._fromsets[v]
     
        def is_path(self, path):
            """
            Returns True if the list of vertices in the argument path are a
            valid path in the graph.  Returns False otherwise.
     
            >>> G = Digraph([(1, 2), (2, 3), (2, 4), (1, 5), (2, 5), (4, 5), (5, 2)])
            >>> G.is_path([1, 5, 2, 4, 5])
            True
            >>> G.is_path([1, 5, 4, 2])
            False
            """
            pass
     
    def random_graph(n, m):
        """
        Make a random Digraph with n vertices and m edges.
     
        >>> G = random_graph(10, 5)
        >>> G.num_edges()
        5
        >>> G.num_vertices()
        10
        >>> G = random_graph(1, 1)
        Traceback (most recent call last):
        ...
        ValueError: For 1 vertices, you wanted 1 edges, but can only have a maximum of 0
        """
        G = Digraph()
        for v in range(n):
            G.add_vertex(v)
     
        max_num_edges = n * (n-1)
        if m > max_num_edges:
            raise ValueError("For {} vertices, you wanted {} edges, but can only have a maximum of {}".format(n, m, max_num_edges))
     
        while G.num_edges() < m:
            G.add_edge(random.sample(range(n), 2))
     
        return G
     
    def spanning_tree(G, start):  
        """ 
        Runs depth-first-search on G from vertex start to create a spanning tree.
        """
        visited = set()
        todo = [ (start, None) ]
     
        T = Digraph()
        
        while todo:
            (cur, e) = todo.pop()
     
            if cur in visited: continue
     
            visited.add(cur)
            if e: T.add_edge(e)
     
            for n in G.adj_to(cur):
                if n not in visited:
                    todo.append((n, (cur, n)))
                    
        return T
     
    def shortest_path(G, source, dest):
        """
        Returns the shortest path from vertex source to vertex dest.
     
        >>> G = Digraph([(1, 2), (2, 3), (3, 4), (4, 5), (1, 6), (3, 6), (6, 7)])
        >>> path = shortest_path(G, 1, 7)
        >>> path
        [1, 6, 7]
        >>> G.is_path(path)
        True
        """
        pass
     
    def compress(walk):
        """
        Remove cycles from a walk to create a path.
        
        >>> compress([1, 2, 3, 4])
        [1, 2, 3, 4]
        >>> compress([1, 3, 0, 1, 6, 4, 8, 6, 2])
        [1, 6, 2]
        """
        
        lasttime = {}
     
        for (i,v) in enumerate(walk):
            lasttime[v] = i
     
        rv = []
        i = 0
        while (i < len(walk)):
            rv.append(walk[i])
            i = lasttime[walk[i]]+1
     
        return rv
        
                
     
    if __name__ == "__main__":
        import doctest
        doctest.testmod()


18.9 Displaying Graphs and Digraphs with Graphviz



Here is the module for taking a graph or digraph expressed as a pair (V, E) of a vertex set V and an edge set E, and then rendering it into a graphviz .dot file for layout and display.

code/GraphTheory/Digraph/display.py

    """
    A module that takes graphs, G=(V, E) and attributes on
    vertices and edges, and generates a .dot file for display
    with graphviz.  
     
    The type of graph to be rendered is normalyy just a undirected
    graph.  To render a directed graph, set the keyword parameter
        graphtype='digraph' 
    the default is graphtype='graph'
     
    V is a set of vertices
    E is a set of edges, in a 
        graph - the tuple (u, v) for u, v in V is an edge between 
            u and v.  The pair (v, u) is equivalent, and only one
            of the two possibilities should appear in E.
        digraph - the tuple (u, v) for u, v in V is a directed edge 
            from u to v.  It is different from the tuple (v, u) 
            which is the edge in the opposite direction.
     
     
    attributes is an optional argument.  It is a dictionary of vertex and
    edge attributes with these possible (optional) keys:
        vertex_color
        vertex_label
        edge_color
        edge_label
    Each of these keys maps to a dictionary.
        atrributes['vertex_color'][v] if present is a string
            that gives the color of vertex v
        atrributes['vertex_label'][v] if present is a string
            that will be used as the label for vertex v.  If
            not present, the label for v is v.
        atrributes['edge_color'][(u,v)] if present is a string
            that gives the color of edge (u, v)
        atrributes['edge_label'][(e,v)] if present is a string
            that will be used as the label for egde (u,v).  If
            not present, the dge is unlabelled. 
        
    """
    import time
    import sys
     
    def gen_dot_desc(G, graphtype='graph', attributes={}):
        """
        Given graph G, return a string that encodes the dot
        representation of G.
     
        >>> g = ({1, 2, 3}, {(1, 2), (1, 3)} )
        >>> s = gen_dot_desc(g)
     
        """
        if "vertex_color" in attributes:
            vertex_color=attributes["vertex_color"]
        else:
            vertex_color={ }
     
        if "edge_color" in attributes:
            edge_color=attributes["edge_color"]
        else:
            edge_color={ }
     
        if "vertex_label" in attributes:
            vertex_label=attributes["vertex_label"]
        else:
            vertex_label={ }
     
        if "edge_label" in attributes:
            edge_label=attributes["edge_label"]
        else:
            edge_label={ }
     
        (V, E) = G
     
        edgesym = "--"
        if graphtype != 'graph':
            graphtype = 'digraph'
            edgesym = "->"
     
        # generate the header
        dot_desc = ( graphtype + 
            " g {\n" + 
            "  ordering=out;\n" +
            "  node [shape=circle];\n" +
            "  edge [penwidth=3];\n" 
            )
     
        # now generate vertex and edges information
        if len(V) == 0:
           dot_desc += "Empty [shape=ellipse];\n"
        else:
            for n in V:
                color = "white"
                if n in vertex_color: 
                    color = vertex_color[n]
     
                label = str(n)
                if n in vertex_label: 
                    label = vertex_label[n]
     
                dot_desc += '  {v} [label="{l}", style=filled, fillcolor="{c}"];\n'.format(
                    v=str(n), l=label, c=color)
     
            for e in E:
                (x, y) = e
                color = "black"
                if e in edge_color: 
                    color = edge_color[e]
                label = ""
                if e in edge_label:
                    label = ', label="{}"'.format(edge_label[e])
     
                dot_desc += '  {vx} {esym} {vy} [color="{c}" {l}];\n'.format(
                        esym=edgesym, vx=str(x), vy=str(y), c=color, l=label)
     
     
        # close off the description
        dot_desc += "}\n"
     
        return dot_desc
     
    def write_dot_desc(G, file_name, graphtype='graph', attributes={}):
        """
        Given graph G, write the dot description of G
        into the given file name, which should have extension .dot
        as in sample-graph.dot
     
        """
        # Implementation note:
        # instead of f = open(file_name, 'w') inside a try block, use
        # the safe open that closes file on an exception, from
        # http://docs.python.org/3.2/tutorial/inputoutput.html
     
        with open(file_name, 'w') as f:
            f.write( gen_dot_desc(G, graphtype, attributes) )
     
    def pause(time=1,prompt="next?"):
        """
        if time > 0, then pause for time sec
        if time=0, then print the prompt and wait for a new line of
            input from the terminal before returning.
        """
        if time > 0:
            time.sleep(time)
        else:
            x = input(prompt)
            
        
     
    if __name__ == "__main__":
        import doctest
        doctest.testmod()
     

18. Graph Theory
Tangible Computing / Version 3.20 2013-03-25