Explore/Exploit Algorithm in Decisions & Regret

Post inspired from: Algorithms to Live By by Brian Christian and Tom Griffiths.

Explore/Exploit: In the explore phase, we gather all the information, and the exploit phase is using that information you have to make the best decision.

Decisions

If we look at the Explore/Exploit algorithm, we can see that an old phrase is actually has some truth behind it:

The grass is always greener on the other side.

We have all heard that saying before, but, did you know that there is some mathematically proof that supports it? If you do not have much data, then the unknown has a higher chance of success, even if we expect it to be no different, or “if it’s just as likely to be worse” (41). Brian Christian and Tom Griffiths state it as the “untested rookie is worth more than the veteran of seemingly equal ability, precisely because we know less about him.” Therein lies the kicker though - once we have enough data, then we make a decision based with the provided data that could possibly change the scenario around. So next draft you see - whether it is football, baseball, basketball, or any sport - remember that the rookies getting the huge contracts are getting the huge contracts because of the chance that they are the best (a higher chance of success).

Regret

“To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been” - Chester Barnard

Another way to look at Explore/Exploit is to frame it in the light of trying to live “a life with minimal regret” (43). The authors share a story about how Jeff Bezos, founder of Amazon.com, took the risk of moving from his well paying job to the online bookstore. They quote him as saying it was an easy decision. Why? Well Bezos used the idea of living a life with minimal regret and did an exercise with it: he imagined himself at 80 years old and then thought back on his life and knew he wanted to have minimal regrets. In this exercise, he knew that not taking the risk would lead to regret, thus, he took the risk.

Researchers have then spent years looking to find a way to guarantee minimal regret - the most famous algorithm is called the Upper Confidence Bound. This algorithm makes a decision not based on what has performed best in the past, but what could perform the best in the future. To explain it, the authors share that “if you have never been to a restaurant before, then for all you know it could be great. Even if you have gone there once or twice, and tried a couple of dishes, you might not have enough information to rule out the possibility that it could yet prove better” (44).

The Upper Confidence Bound algorithm has a sense of optimism about it - the optimism that what you are trying could be the best. “By focusing on the best that an option could be, these algorithms give a boost to possibilities we know less about” (45). What does that mean for you and me? If we think about the world in the terms of these algorithms, then “you should be excited to meet new people and try new things - to assume the best about them, in the absence of evidence to the contrary” (45) because the chances of finding the best that way are better than not even trying at all.