5. Performance Analysis

In this chapter we analyze the behavior of the various node coloring algorithms in the gcol library. Where appropriate, we also make comparisons to similar algorithms from the networkx library.

Here, algorithms are evaluated by looking at solution quality and run times. Details on algorithm complexity (in terms of big O notation) can be found in gcol’s documentation. Here, all tests are conducted on randomly generated Erdos-Renyi graphs, commonly denoted by \(G(n,p)\). These graphs are constructed by taking \(n\) nodes and adding an edge between each node pair at random with probability \(p\). The expected number of edges in a \(G(n,p)\) graph is therefore \(\binom{n}{2}p\), while the expected node degree is \((n-1)p\).

For these tests, we use differing values for \(n\) but keep \(p\) fixed at \(0.5\). This is due to a result of Nick Wormald, who has established that for \(n \gtrapprox 30\), a set of randomly constructed \(G(n, 0.5)\) graphs can be considered equivalent to a random sample from the population of all unlabeled \(n\)-node graphs. This allows us to make general statistical statements about performance across the set of all \(n\)-node graphs, although different observations may well be made when executing these algorithms on specifically chosen topologies, such as scale-free graphs and planar graphs. Examples of these differences are discussed in this book, where a wider set of trials is conducted.

In the code below, each trial involves generating a set of \(G(n,0.5)\) graphs using a range of values of \(n\). The results of the algorithms are then written to a Pandas dataframe df, aggregated in a pivot table, and summarized in charts. Lines in the charts give mean values, while the shaded areas indicate one standard deviation on either side of these means. All results below were found by executing the code on a 3.0 GHtz Windows 11 PC with 16 GB of RAM.

We start by importing the libraries we need.

import pandas as pd
import networkx as nx
import gcol
import matplotlib.pyplot as plt
import time

5.1. Differing Node Coloring Strategies

In our first experiment, we compare the different constructive strategies available for node coloring in the gcol library (namely 'random', 'welsh_powell', 'dsatur', and 'rlf') using \(G(n,0.5)\) graphs with values of \(n\) between \(50\) and \(500\). The results are shown in the charts below. Further details on these algorithms can be found in gcol’s documentation.

#Carry out the trials and put the results into a list
results = []
nVals = range(50,501,50)
for n in nVals:
    for seed in range(50):
        G = nx.gnp_random_graph(n, 0.5, seed)
        for strategy in ["random", "welsh_powell", "dsatur", "rlf"]:
            start = time.time()
            c = gcol.node_coloring(G, strategy)
            results.append([n, seed, strategy, max(c.values()) + 1, time.time()-start])

# Create a pandas dataframe from this list and make a pivot table
df = pd.DataFrame(results, columns=["n", "seed", "strategy", "cols", "time"])
pivot = df.pivot_table(columns='strategy', aggfunc=['mean','std'], values=['cols','time'], index='n')

# Now use the pivot table to make a chart that compares mean solution quality
mean1, SD1 = pivot[("mean","cols","random")], pivot[("std","cols","random")]
mean2, SD2 = pivot[("mean","cols","welsh_powell")], pivot[("std","cols","welsh_powell")]
mean3, SD3 = pivot[("mean","cols","dsatur")], pivot[("std","cols","dsatur")]
mean4, SD4 = pivot[("mean","cols","rlf")], pivot[("std","cols","rlf")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='Random')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='Welsh-Powell')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(nVals, mean3, linestyle='-.', linewidth=1.5, color="g", label='DSatur')
plt.fill_between(nVals, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(nVals, mean4, linestyle=':', linewidth=1.5, color="black", label='RLF')
plt.fill_between(nVals, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Colours")
plt.legend()
plt.show()

# and do the same for mean run times
mean1, SD1 = pivot[("mean","time","random")], pivot[("std","time","random")]
mean2, SD2 = pivot[("mean","time","welsh_powell")], pivot[("std","time","welsh_powell")]
mean3, SD3 = pivot[("mean","time","dsatur")], pivot[("std","time","dsatur")]
mean4, SD4 = pivot[("mean","time","rlf")], pivot[("std","time","rlf")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='Random')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='Welsh-Powell')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(nVals, mean3, linestyle='-.', linewidth=1.5, color="g", label='DSatur')
plt.fill_between(nVals, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(nVals, mean4, linestyle=':', linewidth=1.5, color="black", label='RLF')
plt.fill_between(nVals, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Run time (s)")
plt.legend()
plt.show()
../_images/output_3_01.png ../_images/output_3_1.png

The results above show that the random and welsh-powell strategies produce the poorest solutions overall (in terms of the number of colors they use) while the RLF algorithm produces the best. This gap also seems to widen for larger values of \(n\). On the other hand, the RLF algorithm has less favorable run times, as shown in the second chart. This is to be expected because the RLF algorithm has a higher complexity than the other options. A good compromise seems to be struck by the dsatur strategy, which features comparatively good solution quality and run times.

5.2. Optimization Output

The following code demonstrates how the verbose parameter can be used to produce run-time output for the various optimization algorithms. This allows us to monitor algorithm performance during execution.

G = nx.gnp_random_graph(50, 0.5)
c = gcol.node_coloring(G, strategy="welsh_powell", opt_alg=1, verbose=1)
Running backtracking algorithm:
    Found solution with 13 colors. Total backtracking iterations = 0
    Found solution with 12 colors. Total backtracking iterations = 52
    Found solution with 11 colors. Total backtracking iterations = 1293
    Found solution with 10 colors. Total backtracking iterations = 28972977
Ending backtracking at iteration 34324898 - optimal solution achieved.

In the above example, the initial solution has used 13 colors. The backtracking algorithm (opt_alg=1) has been used to reduce the number of colors, eventually finding an optimal solution. We can do similar things with the other optimization algorithms while controlling their number of iterations:

G = nx.gnp_random_graph(50, 0.5)
c = gcol.node_coloring(G, strategy="welsh_powell", opt_alg=2, it_limit=10000, verbose=1)
Running local search algorithm:
    Found solution with 12 colors. Total local search iterations = 0 / 10000
    Found solution with 11 colors. Total local search iterations = 12 / 10000
    Found solution with 10 colors. Total local search iterations = 241 / 10000
Ending local search. Iteration limit of 10000 has been reached.

In some cases, we can also increase the amount of output by using a larger value with verbose:

G = nx.gnp_random_graph(50, 0.5)
c = gcol.node_coloring(G, strategy="welsh_powell", opt_alg=3, it_limit=10000, verbose=2)
Running local search algorithm:
    Found solution with 12 colors. Total local search iterations = 0 / 10000
    Running PartialCol algorithm using 11 colors
        Solution with 11 colors and cost 5 found by PartialCol at iteration 0
        Solution with 11 colors and cost 4 found by PartialCol at iteration 1
        Solution with 11 colors and cost 3 found by PartialCol at iteration 2
        Solution with 11 colors and cost 2 found by PartialCol at iteration 3
        Solution with 11 colors and cost 1 found by PartialCol at iteration 4
        Solution with 11 colors and cost 0 found by PartialCol at iteration 6
    Ending PartialCol
    Found solution with 11 colors. Total local search iterations = 6 / 10000
    Running PartialCol algorithm using 10 colors
        Solution with 10 colors and cost 5 found by PartialCol at iteration 0
        Solution with 10 colors and cost 4 found by PartialCol at iteration 1
        Solution with 10 colors and cost 3 found by PartialCol at iteration 22
        Solution with 10 colors and cost 2 found by PartialCol at iteration 32
        Solution with 10 colors and cost 1 found by PartialCol at iteration 155
        Solution with 10 colors and cost 0 found by PartialCol at iteration 896
    Ending PartialCol
    Found solution with 10 colors. Total local search iterations = 902 / 10000
    Running PartialCol algorithm using 9 colors
        Solution with 9 colors and cost 5 found by PartialCol at iteration 0
        Solution with 9 colors and cost 4 found by PartialCol at iteration 6
        Solution with 9 colors and cost 3 found by PartialCol at iteration 46
        Solution with 9 colors and cost 2 found by PartialCol at iteration 1498
    Ending PartialCol
Ending local search. Iteration limit of 10000 has been reached.

5.3. Comparison to NetworkX

The next set of experiments compares the performance of gcol’s local search routines and NetworkX’s interchange coloring routine. As a benchmark, we also include gcol’s dsatur option from earlier, which is also used to produce the initial solutions for the local search algorithms. For comparative purposes, two of gcol’s local search algorithms (opt_alg=2 and opt_alg=3) are used here, and we impose a fixed iteration limit of \(n\). The results are collected and displayed in the same manner as the previous example.

#Carry out the trials and put the results into a list
results = []
nVals = range(50,601,50)
for n in nVals:
    for seed in range(50):
        G = nx.gnp_random_graph(n, 0.5, seed)
        start = time.time()
        c = nx.greedy_color(G, "largest_first", interchange=True)
        results.append([n, seed, "networkx", max(c.values()) + 1, time.time()-start])
        start = time.time()
        c = gcol.node_coloring(G)
        results.append([n, seed, "dsatur", max(c.values()) + 1, time.time()-start])
        start = time.time()
        c = gcol.node_coloring(G, opt_alg=2, it_limit=len(G))
        results.append([n, seed, "opt_alg=2", max(c.values()) + 1, time.time()-start])
        start = time.time()
        c = gcol.node_coloring(G, opt_alg=3, it_limit=len(G))
        results.append([n, seed, "opt_alg=3", max(c.values()) + 1, time.time()-start])

# Create a pandas dataframe from this list and make a pivot table
df = pd.DataFrame(results, columns=["n", "seed", "alg", "cols", "time"])
pivot = df.pivot_table(columns='alg', aggfunc=['mean','std'], values=['cols','time'], index='n')

# Use the pivot table to make charts as before
mean1, SD1 = pivot[("mean","cols","networkx")], pivot[("std","cols","networkx")]
mean2, SD2 = pivot[("mean","cols","dsatur")], pivot[("std","cols","dsatur")]
mean3, SD3 = pivot[("mean","cols","opt_alg=2")], pivot[("std","cols","opt_alg=2")]
mean4, SD4 = pivot[("mean","cols","opt_alg=3")], pivot[("std","cols","opt_alg=3")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='NetworkX')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='DSatur')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(nVals, mean3, linestyle='-.', linewidth=1.5, color="g", label='opt_alg=2')
plt.fill_between(nVals, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(nVals, mean4, linestyle=':', linewidth=1.5, color="black", label='opt_alg=3')
plt.fill_between(nVals, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Colours")
plt.legend()
plt.show()

mean1, SD1 = pivot[("mean","time","networkx")], pivot[("std","time","networkx")]
mean2, SD2 = pivot[("mean","time","dsatur")], pivot[("std","time","dsatur")]
mean3, SD3 = pivot[("mean","time","opt_alg=2")], pivot[("std","time","opt_alg=2")]
mean4, SD4 = pivot[("mean","time","opt_alg=3")], pivot[("std","time","opt_alg=3")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='NetworkX')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='DSatur')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(nVals, mean3, linestyle='-.', linewidth=1.5, color="g", label='opt_alg=2')
plt.fill_between(nVals, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(nVals, mean4, linestyle=':', linewidth=1.5, color="black", label='opt_alg=3')
plt.fill_between(nVals, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Run time (s)")
plt.legend()
plt.show()
../_images/output_11_0.png ../_images/output_11_12.png

It is clear from the above results that the local search algorithms make significant improvements to the solutions provided by the dsatur strategy, albeit with additional time requirements. The solutions and run times of these local search algorithms are also superior to NetworkX’s node coloring routines. Further improvements in solution quality can usually be found by increasing the iteration limit of the local search algorithms, as demonstrated below.

5.4. Exact Algorithm Performance

In addition to local search, the gcol library features an exact, exponential-time algorithm for node coloring, based on backtracking. This algorithm is invoked by setting opt_alg=1 (the parameter it_limit is redundant here). At the start of this algorithm’s execution, a large clique \(C\) is identified in \(G\) using the NetworkX function nx.max_clique(). The nodes of \(C\) are then permanently assigned to different colors. The main backtracking algorithm is then executed and only halts only when a solution using \(C\) colors has been identified, or when the algorithm has backtracked to the root of the search tree. In both cases the returned solution will be optimal (that is, it will be using the minimum number of colors).

The following code evaluates the performance of this algorithm on \(G(n,0.5)\) graphs for a range of \(n\)-values.

results = []
nVals = range(2,61,2)
for n in nVals:
    for seed in range(25):
        G = nx.gnp_random_graph(n, 0.5, seed)
        start = time.time()
        c = gcol.node_coloring(G, opt_alg=1)
        results.append([n, seed, "opt_alg=1", max(c.values()) + 1, time.time()-start])

# Create a pandas dataframe from this list and make a pivot
df = pd.DataFrame(results, columns=["n", "seed", "alg", "cols", "time"])
pivot = df.pivot_table(columns='alg', aggfunc=['mean','std'], values=['cols','time'], index='n')

# Use the pivot table above to make the charts as before
mean1, SD1 = pivot[("mean","cols","opt_alg=1")], pivot[("std","cols","opt_alg=1")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=1')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Colours")
plt.legend()
plt.show()

mean1, SD1 = pivot[("mean","time","opt_alg=1")], pivot[("std","time","opt_alg=1")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=1')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylim((0, 120))
plt.ylabel("Run time (s)")
plt.legend()
plt.show()
../_images/output_13_0.png ../_images/output_13_11.png

The first chart above shows the chromatic numbers from a sample of \(G(n,0.5)\) graphs for an increasing number of nodes \(n\). It can be seen that the chromatic number rises in a close-to-linear fashion in relation to \(n\). The second figure demonstrates the disadvantages of using this exponential-time algorithm: once \(n\) is increased beyond a moderately small value (approximately 55 here), run times become unpredictable and often very long. Note, however, that the specific \(n\)-values that give these long run times can vary considerably depending on the topology of the graph. For example, planar graphs and scale-free graphs with several hundred nodes are often solved very quickly by the backtracking algorithm. These sorts of results will usually need to be confirmed empirically.

5.5. Local Search Comparison

In addition to the above exact algorithm, the gcol library features a choice of four high-performance optimization heuristics, based on local search (more specifically, tabu search).

  • opt_alg=2 makes use of the TabuCol algorithm

  • opt_alg=3 makes use of the PartialCol algorithm

  • opt_alg=4 uses a hybrid evolutionary algorithm (HEA) in conjunction with TabuCol

  • opt_alg=5 uses a hybrid evolutionary algorithm (HEA) in conjunction with PartialCol

These are among the best-known algorithms for graph coloring. Further information on these can be found in the library’s documentation and in this book. Executing these heuristics with higher iteration limits usually leads to better solutions (that is, solutions using fewer colors). To illustrate, the following code runs the four optimization algorithms with differing iteration limits on a set of twenty \(G(500, 0.5)\) graphs.

results = []
it_limits = [200, 2000, 20000, 200000, 2000000, 20000000]
for seed in range(20):
    G = nx.gnp_random_graph(500, 0.5, seed)
    for opt_alg in [2, 3, 4, 5]:
        for it_limit in it_limits:
            start = time.time()
            c = gcol.node_colouring(G, opt_alg=opt_alg, it_limit=it_limit)
            results.append([seed, opt_alg, it_limit, max(c.values()) + 1, time.time()-start])

# Create a pandas dataframe from this list and make a pivot table
df = pd.DataFrame(results, columns=["seed", "opt_alg", "it_limit", "cols", "time"])
pivot = df.pivot_table(columns='opt_alg', aggfunc=['mean','std'], values=['cols','time'], index='it_limit')

# Use the pivot table to make charts as before
mean1, SD1 = pivot[("mean","cols",2)], pivot[("std","cols",2)]
mean2, SD2 = pivot[("mean","cols",3)], pivot[("std","cols",3)]
mean3, SD3 = pivot[("mean","cols",4)], pivot[("std","cols",4)]
mean4, SD4 = pivot[("mean","cols",5)], pivot[("std","cols",5)]
plt.plot(it_limits, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=2')
plt.fill_between(it_limits, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(it_limits, mean2, linestyle='--', linewidth=1.5, color="r", label='opt_alg=3')
plt.fill_between(it_limits, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(it_limits, mean3, linestyle='-.', linewidth=1.5, color="g", label='opt_alg=4')
plt.fill_between(it_limits, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(it_limits, mean4, linestyle=':', linewidth=1.5, color="black", label='opt_alg=5')
plt.fill_between(it_limits, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("it_limit")
plt.ylabel("Colours")
plt.legend()
plt.xscale('log')
plt.show()

mean1, SD1 = pivot[("mean","time",2)], pivot[("std","time",2)]
mean2, SD2 = pivot[("mean","time",3)], pivot[("std","time",3)]
mean3, SD3 = pivot[("mean","time",4)], pivot[("std","time",4)]
mean4, SD4 = pivot[("mean","time",5)], pivot[("std","time",5)]
plt.plot(it_limits, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=2')
plt.fill_between(it_limits, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(it_limits, mean2, linestyle='--', linewidth=1.5, color="r", label='opt_alg=3')
plt.fill_between(it_limits, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.plot(it_limits, mean3, linestyle='-.', linewidth=1.5, color="g", label='opt_alg=4')
plt.fill_between(it_limits, mean3-SD3, mean3+SD3, color='k', alpha=0.2)
plt.plot(it_limits, mean4, linestyle=':', linewidth=1.5, color="black", label='opt_alg=5')
plt.fill_between(it_limits, mean4-SD4, mean4+SD4, color='g', alpha=0.2)
plt.xlabel("it_limit")
plt.ylabel("Run time (s)")
plt.legend()
plt.xscale('log')
plt.show()
../_images/output_15_0.png ../_images/output_15_11.png

Note that the charts above use log scales on their horizontal axes. The first chart shows how using an increased iteration limit can result in solutions that have fewer colors. For very high limits, the hybrid evolutionary algorithms are clearly more favorable. The second chart shows how the iteration limits affect run times with these graphs.

5.6. Equitable Coloring

In the (unweighted) equitable node-coloring problem, we are interested in coloring the nodes with a user-defined number of colors \(k\) so that (a) adjacent nodes have different colors, and (b) the number of nodes in each color is as equal as possible. The following trials run the gcol.equitable_node_k_coloring() method on a sample of random \(G(500,0.5)\) graphs over a range of suitable \(k\)-values. The reported cost is simply the difference in size between the largest and smallest color classes in a solution. Hence, if \(k\) is a divisor of \(n\), a cost of zero indicates an equitable \(k\)-coloring, else a cost of one indicates an equitable coloring.

results = []
n = 500
kVals = range(70, 300, 1)
for seed in range(50):
    G = nx.gnp_random_graph(n, 0.5, seed)
    for k in kVals:
        start = time.time()
        c = gcol.equitable_node_k_coloring(G, k, opt_alg=2, it_limit=len(G))
        P = gcol.partition(c)
        cost = max(len(j) for j in P) - min(len(j) for j in P)
        results.append([k, seed, "opt_alg=2", cost, time.time()-start])

# Create a pandas dataframe from this list and make a pivot table
df = pd.DataFrame(results, columns=["k", "seed", "alg", "cost", "time"])
pivot = df.pivot_table(columns='alg', aggfunc=['mean','std'], values=['cost','time'], index='k')

# Use the pivot table above to make charts as before
mean1, SD1 = pivot[("mean","cost","opt_alg=2")], pivot[("std","cost","opt_alg=2")]
plt.plot(kVals, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=2')
plt.fill_between(kVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.xlabel("Number of colors $k$")
plt.ylabel("Cost")
plt.legend()
plt.show()

mean1, SD1 = pivot[("mean","time","opt_alg=2")], pivot[("std","time","opt_alg=2")]
plt.plot(kVals, mean1, linestyle='-', linewidth=1.5, color="b", label='opt_alg=2')
plt.fill_between(kVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.xlabel("Number of colors $k$")
plt.ylabel("Run time (s)")
plt.legend()
plt.show()
../_images/output_17_0.png ../_images/output_17_11.png

The first chart above demonstrates that the gcol.equitable_node_k_coloring() method consistently achieves equitable node \(k\)-colorings. The exceptions occur for low values of \(k\) (which are close to the chromatic number) and when \(k\) is a divisor of \(n\). In the former case, the low number of available colors restricts the choice of appropriate colors for each node, often leading to inequitable colorings. On the other hand, when \(k\) is a divisor of \(n\), the algorithm is seeking a solution with a cost of zero, meaning that each color class must have exactly the same number of nodes. If this cannot be achieved, then a cost of at least two must be incurred.

The second chart above also indicates that runtimes of this routine increase slightly when \(k\) is a divisor of \(n\). Run times also lengthen due to increases in \(k\). The latter is due to the larger number of solutions that need to be evaluated in each iteration of the local search algorithm used with this routine. More details on this algorithm can be found in gcol’s documentation.

Finally, note that NetworkX features an exact equitable node \(k\)-coloring routine, but this can only be used for values of \(k\geq \Delta(G)+1\), where \(\Delta(G)\) is the highest node degree in the graph. In the \(G(500,0.5)\) graphs considered here, the minimum valid value for \(k\) is approximately 280.

5.7. Independent Set Comparison

Our final set or trials looks at the performance the gcol.max_independent_set() routine and compares it to the approximation algorithm included in NetworkX for the same problem. As before, we use an iteration limit of \(n\) for the former.

#Carry out the trials and put the results into a list
results = []
nVals = range(50,501,50)
for n in nVals:
    for seed in range(50):
        G = nx.gnp_random_graph(n, 0.5, seed)
        start = time.time()
        S = gcol.max_independent_set(G, it_limit=len(G))
        results.append([n, seed, "gcol", len(S), time.time()-start])
        start = time.time()
        S = nx.approximation.maximum_independent_set(G)
        results.append([n, seed, "networkx", len(S), time.time()-start])

# Create a pandas dataframe from this list and make a pivot table
df = pd.DataFrame(results, columns=["n", "seed", "alg", "size", "time"])
pivot = df.pivot_table(columns='alg', aggfunc=['mean','std'], values=['size','time'], index='n')

# Create the charts as before
mean1, SD1 = pivot[("mean","size","networkx")], pivot[("std","size","networkx")]
mean2, SD2 = pivot[("mean","size","gcol")], pivot[("std","size","gcol")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='NetworkX')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='GCol')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Independent Set Size")
plt.legend()
plt.show()

mean1, SD1 = pivot[("mean","time","networkx")], pivot[("std","time","networkx")]
mean2, SD2 = pivot[("mean","time","gcol")], pivot[("std","time","gcol")]
plt.plot(nVals, mean1, linestyle='-', linewidth=1.5, color="b", label='NetworkX')
plt.fill_between(nVals, mean1-SD1, mean1+SD1, color='b', alpha=0.2)
plt.plot(nVals, mean2, linestyle='--', linewidth=1.5, color="r", label='GCol')
plt.fill_between(nVals, mean2-SD2, mean2+SD2, color='r', alpha=0.2)
plt.xlabel("Number of nodes $n$")
plt.ylabel("Run time (s)")
plt.legend()
plt.show()
../_images/output_19_0.png ../_images/output_19_11.png

The results above show quite clearly that the gcol.max_independent_set() routine produces better quality solutions (larger independent sets) in less time. As before, further improvements in solution quality (but longer run times) may also be found by increasing the it_limit parameter.