**The Secretary Problem**

One example of how optimal stopping theory is applied is the secretary problem. The secretary problem is a hypothetical situation in which you are searching for the perfect job candidate to fill a secretarial position. In this scenario, there are a few assumptions made to keep the problem simple:

- The number of candidates, n, is known.
- The candidates are ranked best to worst, with no ties, but the ranks are unknown.
- Only the best candidate is chosen. Since ranks are unknown, you must guess who the best is.
- The candidates are interviewed in a random order, and the decision to keep or reject is made immediately after the interview, before proceeding to the next.
- Once a candidate is rejected, they can never be recalled.
- Once a candidate is chosen, no more are interviewed.

If you were to pick by trial and error, you might notice that you can usually get pretty close to finding the best candidate by automatically rejecting a few, then picking the next one you find that is better than all the ones you’ve seen until then. Unless you are unlucky enough to have the best one fall into the first few that you reject, this strategy usually works. In fact, that is the general idea behind solving the secretary problem.

If you pick the kth candidate, then the minimum number of candidates you pass over is k-1 (because if you pick the first one, you aren’t passing over any). Your choices are k, k+1, k+2, etc., until you reach the number of candidates, n. Your goal is to optimize the probability of choosing the best candidate, which you could write as pn(k).

As a good start to solving this problem, you could look at a situation when n = 3. The candidate ranks are therefore 1, 2, and 3. Writing out the permutations when n = 3, it becomes clear that the optimal strategy is to pick k = 2.

1 2 3 |

1 3 2 |

2 1 3 |

2 3 1 |

3 1 2 |

3 2 1 |

Consider each row in the table above as a different scenario. In the first row, the best candidate (the candidate with rank 1) shows up first. If you pick the candidate that shows up first (in other words, you pick k = 1), then your odds of picking the highest ranked candidate are 2/6. If you pick the candidate that shows up second, or k = 2, then your odds of picking a candidate whose rank is higher than the first one are 3/6. If you pick the third candidate, you pick when k = 3, and your odds of picking a candidate who is better than the first two are 2/6. So as I mentioned, the best strategy is to pick when k = 2.

You could write out the 24 permutations for n = 4, the 120 for n = 5, and so on, finding the optimal kth choice for each number of candidates, but it would quickly become hard to do manually. Let x be the integer number of the best candidate. An equation to describe the probability of success, S subscript (n,k), (success meaning you picked the best candidate), could be:

Reading through that: the probability of success is 1/number of candidates, n, when you pick the first candidate. The probability of success is k-1/number of candidates, n, times the sum of 1/x-1 from x = k to n, when you pick any candidate from the second to the last. You can test this equation against the case when n=3. You will get the same probabilities that you manually calculated before. A proof of how it is derived can be found here.

Now the probability of success, P(Sn,k), has to be optimized by determining what happens when the number of available candidates, n, gets infinitely large. This requires setting the derivative of the bottom part of the piecewise function above equal to zero to find critical values, and finding the critical value that is the global maximum for the function. At first it might seem intimidating to take the derivative of that bottom chunk, but it becomes easier when you notice that it can be rewritten as a definite integral by substituting t for k-1/n and u for x-1:

Integration rules give:

t * (ln(1) - ln(t)) = -t * ln(t)

The derivative of –t * ln(t) with respect to t is: –ln(t) - 1. Setting this equal to zero and solving for t gives:

-ln(t)-1 = 0

-ln(t) = 1

ln(t) = -1

t = 1/e

You could plug this value into the original equation, but by simply looking at the graph, you can tell that it is the value that maximizes the function.

**This means that when you have a large number of candidates, your odds of picking the best one are the highest when you let 1/e (or about 36.8%) of the candidates go by without picking any, and then pick the next candidate who is better than all of the others seen so far**. When you do this, your odds of picking the best one are 36.8%, which may seem low, but it is the highest percentage that you can hope to get.

In the real world, you probably never know the true number of candidates, n. But you can mitigate your lack of such knowledge by setting restrictions. For example, if you set the number of people you’re willing to date throughout your life to 200, then you can follow the 36.8% heuristic to find the best mate out of the 200. It is interesting to think that letting 36.8% of your options go by is like establishing a baseline. By that point, you will have a good idea of what is out there, and will then be in a position to make a fair judgment about what option is worth choosing.

Note that there are variations of the secretary problem that change the outcome. For example, you could have group interviews, or the candidate you choose could reject your offer. You can read more about those scenarios here.

**Exploring Optimal Stopping Theory Computationally**

I wrote a Python script to test the optimal stopping theory. You can fork my GitHub repo for it. The script works by generating a list of 1,000 random numbers between 1 and 100. The script loops through the list to store the first 36.8% of the variables as a baseline, just like optimal stopping theory recommends, and then finds the next highest number that is higher than all the others before it. Because the seed is set to allow reproducibility, no matter how many times you run the script, you’ll get the same set of random numbers and therefore the same result. In my test, the best value, 100, was encountered in the first 36.8% of the observations, and no option following that one was better. Even though there could have been other 100s after the first one, there were no numbers that could ever exceed it. You can change the seed to any other number and try it again. It doesn’t matter what number you pick for the seed, it’s just used for repeatability. For a seed of 10, the random numbers are shown below:

As you can see, if the numbers are ranked in order of 1 to 1,000, you’ll run across very high numbers rather quickly (the rank order goes from the bottom to the top of the y-axis). So the result makes sense. After all, you would expect to find the highest number only 36.8% of the time if the optimal stopping theory holds.

In this blog, I explained the theory of optimal stopping. I proved why you maximize your odds of picking the best option out of a set of n total options if you let 36.8% of them go by without choosing, and then pick the next option that is better than all of the others you’ve seen until then. Finally, I simulated a version of the secretary problem with Python, and showed that most of the time you probably won’t pick the best option. So there is a little math that you can use to go out into the world and maximize your odds of making the best choices in life.