Random Solutions are Often Good Enough

Note: This blog entry is a cross-post from Digital Operatives' blog. You can read the original post here.

Hard Problems

Computer Scientists and Software Engineers deal with computationally hard problems all the time. But those challenges don’t solely apply to those people creating the software; they are equally relevant to those people analyzing it. For cybersecurity researchers, finding a hash collision… determining the user inputs that can assign a certain value to a tainted EIP… deciding whether a black-box binary is malicious… they’re all really hard! As our work becomes more and more automated with the help of program analysis techniques, these computationally hard problems become more and more prevalent. But what makes one problem harder than another? In the remainder of this post I’ll answer that question, and introduce some surprising new results of my research that demonstrate how good solutions to inherently very hard problems can be quickly generated. Read on for more.

In order to get there, we’re first going to have to review the concepts of computational complexity and approximation algorithms. If you are already familiar with the former, you can skip to the third section. If you are already familiar with both, you can skip to the fourth section.

It’s clear that certain computational tasks are harder than others. For example, finding the smallest value in an $n$-element array can be completed in about $n$ operations (i.e., simply iterate over the array once, keeping track of the smallest). Sorting that array, on the other hand, will always take at least $n \log(n)$ operations in the worst case (for an explanation of why that is, here is a great article). Sure enough, if we look at the common sorting algorithms (quicksort, heapsort, mergesort, bubblesort, etc.), it’s always pretty easy to construct $n$-element arrays that will require those algorithms to execute at least $n \log(n)$ operations. In fact, when quicksort is presented with an array that is already nearly in sorted order, it will require $n^2$ operations, which is much slower. Therefore, we can say that the problem of sorting is inherently harder than finding the infimum (smallest element) of an array.

There are in fact many different classes of problems that have known “hardness.” Some classes are easier to solve than others. Certain classes also have other interesting properties; for example, a group of problems that are called “P-complete” are all very likely to be difficult to parallelize effectively. The common compression algorithm LZW (used in GIF) solves one such P-complete problem, so I’m sorry to report that it’s very unlikely that you’re going to get a speedup by running your automated animated GIF meme generator on a GPU.

In the remainder of this post we’re going to be focusing on what are called NP-hard problems. These are a class of problems for which it is very likely that there is no way to solve them efficiently. The interesting thing is that if someone were to find a single efficient way to solve one of them, then we’d immediately have a solution for all of them.

Really Hard Problems

One of the most famous examples of an NP-hard problem is the traveling salesman problem:

What is the shortest possible route to visit each city in a map, without visiting any city more than once, and returning to the originating city at the end?

Variants of this problem crop up in the real world all the time, particularly in logistical networks (e.g., how companies like UPS and FedEx decide routes for their delivery vehicles and locations for their distribution centers). Another famous NP-hard problem is the Steiner network problem:

What is the shortest set of roads that connects a given set of cities to each other?

That problem is very important in efficient networking protocol design. The games Sudoku, Minesweeper, and Battleship, as well as many others, are all also NP-hard. And, as you may have guessed, many of the cybersecurity tasks and program analysis problems we encounter on a daily basis are also NP-hard. That’s why things like static taint analysis, constraint solving, and concolic execution that many people use in vulnerability research are so slow.

The important part here is that there is no known polynomial time algorithm that solves any NP-hard problem. I’ll get to what “polynomial time” means in a minute; for now you can just assume that it means “solvable in a reasonable amount of computation time.” Discovering a polynomial time algorithm that solves an NP-hard problem or, conversely, proving no such algorithm can exist, is one of the most important open problems in theoretical computer science. If you can devise such an algorithm, xor prove that none can exist, then you’ll win the million dollar Millenium Prize. More importantly: If such an algorithm does exist, then it has very many significant consequences. For example, such an algorithm would break the assumptions of many cryptographic schemes and mean that you could efficiently break most cryptography very quickly. Most computer scientists and mathematicians believe that it is impossible to create a polynomial time algorithm to solve an NP-hard problem, but so far no one has been able to prove it.

So what does “polynomial time” mean, and why do we want to achieve it? Computer scientists classify the computational complexity of an algorithm based on the order of magnitude of the number of operations the algorithm would have to perform in the worst case. Let’s say we create an algorithm to find the optional solution to the traveling salesman problem simply by brute-force enumerating all possible paths through the map. In the worst case, the input map is going to have a road between each pair of cities. That means our algorithm, in the worst case, is going to need to test every possible path (these are what are called Hamiltonian paths), of which there are $\frac{(n - 1)!}{2}$, where $n$ is the total number of cities. That means our algorithm does not run in worst-case polynomial time, because the equation $\frac{(n - 1)!}{2}$ is not a polynomial. For it to be a polynomial, it would have to have a worst-case runtime equation that behaves asymptotically similar to $n^c$ for any constant $c$.

I’ll get to why such a polynomial expression is so desirable in a minute, but first let’s look at an example of why slower-than-polynomial-time algorithms, like our traveling salesman one, are very bad. As we can see, our algorithm quickly becomes very slow as the number of cities, $n$, increases:

$n$ (number of cities)		Number of Paths to Test
3	$\frac{1}{2}(3 - 1)!$ =	1
4	$\frac{1}{2}(4 - 1)!$ =	3
5	$\frac{1}{2}(5 - 1)!$ =	12
6	$\frac{1}{2}(6 - 1)!$ =	60
7	$\frac{1}{2}(7 - 1)!$ =	360
8	$\frac{1}{2}(8 - 1)!$ =	2520
9	$\frac{1}{2}(9 - 1)!$ =	20160
10	$\frac{1}{2}(10 - 1)!$ =	181440
11	$\frac{1}{2}(11 - 1)!$ =	1814400
12	$\frac{1}{2}(12 - 1)!$ =	19958400
13	$\frac{1}{2}(13 - 1)!$ =	239500800
14	$\frac{1}{2}(14 - 1)!$ =	3113510400
15	$\frac{1}{2}(15 - 1)!$ =	43589145600
16	$\frac{1}{2}(16 - 1)!$ =	653837184000
17	$\frac{1}{2}(17 - 1)!$ =	10461394944000
18	$\frac{1}{2}(18 - 1)!$ =	177843714048000
19	$\frac{1}{2}(19 - 1)!$ =	3201186852864000
20	$\frac{1}{2}(20 - 1)!$ =	60822550204416000
⋮		⋮
30	$\frac{1}{2}(30 - 1)!$ =	4420880996869850977271808000000
⋮		⋮
61	$\frac{1}{2}(61 - 1)!$ =	4160493556370695072138170591611682190377086303180622976224638848204800000000000000

So, with as few as 61 cities on the map, our algorithm would have to consider over four sexvigintillion possible paths! That’s about 42 times the number of atoms in the visible universe! With only nine cities, It’s Over 9000!!!1 And that’s not even taking into account the number of operations required to enumerate each of those paths, which likely increases the overall figure by at least a factor of $n$.

But what makes polynomial time algorithms tractable while asymptotically slower algorithms are not? Why does that phase transition occur specifically at the point when the worst-case number of operations grows polynomially? It all has to do with Moore’s law, which states that processor speed doubles every two years or so. (More accurately, Moore’s law speaks of the number of components per integrated circuit.) Now, let’s say we have an algorithm for solving the traveling salesman problem that can quickly solve problem instances up to $k$ cities on current hardware. How many years will we have to wait before hardware will be fast enough to solve a problem of size $k + 1$? If our algorithm runs in roughly $n^c$ operations (i.e., it runs in polynomial time), then, by Moore’s law, we will only have to wait about $\log_2\left(1 + k^{-1}\right)c$ years before we can solve a problem of size $k+1$. For most values of $k$ and $c$, that is a very short time! If our algorithm’s runtime were instead exponential (e.g., $c^n$), then the number of years we would have to wait would be at least linear in $k$.

If we’re stuck solving a problem that we know is NP-hard—which means that there is no known polynomial time algorithm, and there will likely never be one—what are we to do? Luckily, there is a subfield of computer science known as approximation algorithms whose goal is to find algorithms that can quickly find solutions that are not necessarily optimal, but are within some known bound of optimal. That’s what we are going to talk about in the next section.

The “Bingo Problem”: Approximate Solutions to Really Hard Problems

We’re all likely familiar with the game of BINGO: Balls labeled with letters and numbers are randomly drawn…

This area should be automatically filled with BINGO balls. If you see this message, it means that you probably need to enable JavaScript!

…until one of any given subset of balls (as specified by the BINGO cards) is collected.

Let’s imagine a variant in which all letters of the alphabet exist on the balls and they are each given arbitrary floating point values:

This area should be automatically filled with BINGO balls. If you see this message, it means that you probably need to enable JavaScript!

The object of this new type of BINGO game is: Given n balls chosen randomly from the infinite set of all possible letter/number combinations, find some subset of the balls that spells out an English word. The goal is to maximize the sum of the values of the balls used for that word. It’s a bit like scrabble. If the word length is unbounded, this game is in fact NP-hard (it is a variant of the subset sum problem). As an example, say the following thirteen balls are randomly chosen:

P
1

R
48.7

N
79.8

O
26

O
83.8

P
23.6

A
22.7

M
92.1

I
34.2

T
36.7

A
64.5

I
84.5

X
49.1

One solution would be to choose the word “MOTION” with a value of 402.9:

P
1

R
48.7

N
79.8

O
26

O
83.8

P
23.6

A
22.7

M
92.1

I
34.2

T
36.7

A
64.5

I
84.5

X
49.1

That isn’t the best we can do, though. A better solution would be to choose the word “APPROXIMATION”, which has a value of 646.7:

P
1

R
48.7

N
79.8

O
26

O
83.8

P
23.6

A
22.7

M
92.1

I
34.2

T
36.7

A
64.5

I
84.5

X
49.1

This is in fact the optimal solution, since it uses all of the possible letters. If we had originally chosen MOTION, then, in approximation algorithms parlance, we say that the “constant of approximation” is 646.7 ÷ 402.9 = 1.6051. In other words, MOTION is 1.6051 times “worse” than optimal.

In the case of the traveling salesman problem, there are well known polynomial time approximation algorithms that can return a result with a 1.5 constant of approximation. In other words, these algorithms can return a result that is guaranteed to be no worse than 1.5 times the value of the optimal result, even though we don’t actually know the value of the optimal solution! For most NP-hard problems, any constant of approximation that is 2 or lower is considered good.

Random Solutions to Really Hard Problems

I like to ask stupid questions. So I asked myself: “What if our algorithm just chooses a solution randomly?” It’s usually pretty easy to write a fast, dumb algorithm to return random solutions. In the Bingo problem, it simply amounts to ignoring the ball values and returning a random word whose letters match the balls. That can certainly be implemented in polynomial time if we have an indexed dictionary of words. For problems like minimum Steiner network, we can just randomly generate any spanning tree, which will always be a feasible (valid) solution, and which we can compute in linear time. On average, how bad are those random solutions likely to be?

First, let’s sort the balls in order of increasing value:

P
1

A
22.7

P
23.6

O
26

I
34.2

T
36.7

R
48.7

X
49.1

A
64.5

N
79.8

O
83.8

I
84.5

M
92.1

$\ell$

$m$

If we know that the shortest word in our dictionary is $\ell$ letters long and the longest word in our dictionary is $m$ letters long, then the expected value for the constant of approximation of a randomly selected word is at most equal to the expected value of the sum of the $m$ highest-value balls divided by the expected value of the $\ell$ lowest value balls. This is simplifying a lot of details (e.g., things get a lot more complex if the values can be negative), and it requires a bit of advanced math, but luckily we’ve already worked it all out. You can read about it in our recently published paper here.

If we distill the crufty mathematical incantations we needed to handle all of the edge cases for the formal proofs, our results can be summarized as follows:

Under very reasonable assumptions, the expected value for the constant of approximation of a randomly selected feasible solution is almost always going to be at most two!

That’s really surprising, especially realizing that this applies for all NP-hard problems, if formulated correctly. So, simply choosing a random solution is often effectively as good as the best approximation algorithms that are currently known. In fact, in our paper we linked above we present some empirical evidence suggesting that the random solutions are often even closer to optimal than ones produced by state-of-the-art approximation algorithms.

Conclusions

Certain computational problems are harder than others, and a class of them, called NP-hard, are very likely to be intractable to solve optimally. A great number of cybersecurity and vulnerability research tasks fall into that class. Polynomial time algorithms are desirable because if a problem instance is too large to be solved by current hardware, we don’t have to wait very long until hardware will become fast enough to solve that instance using the polynomial time algorithm. Polynomial time approximation algorithms can be used on intractable NP-hard problems to find solutions relatively quickly that are within a known distance from the optimal solution (even if we don’t actually know what the optimal solution is). For most NP-hard problems, a polynomial time approximation algorithm that can achieve a solution that is no worse than two times the cost of the optimal solution is usually considered good. The surprising result of our new research is that randomly selected feasible solutions will on average be at least as good as those complex approximation algorithms!

So the moral of the story is: Sometimes quickly and mindlessly choosing a random solution isn’t half bad!

← Older Post

Newer Post →

Random Solutions are Often Good Enough

Hard Problems

Really Hard Problems

The “Bingo Problem”: Approximate Solutions to Really Hard Problems

Random Solutions to Really Hard Problems

Conclusions