A Tutorial on Thompson Sampling by Lydia Gu [PWL NYC]
Multi-armed bandits is an online machine learning framework which trades off exploitation, selecting the current best choice, and exploration, gathering data on unknown options. One strategy for implementing this tradeoff is Thompson sampling. First proposed in 1933 in the context of clinical trials, Thompson sampling was mostly forgotten in academic literature until the recent decade. Around 2010, a couple of papers demonstrated empirically its competitive performance, prompting a flurry of academic work. In this lightning talk, we will give an overview of the multi-armed bandits problem and the Thompson sampling algorithm, and see how it has been used by companies for personalization.
Paper: https://arxiv.org/pdf/1707.02038.pdf
Bio:
Lydia Gu is a tech lead at B12, a startup that's changing the way websites are made using humans + AI. She has an MEng from MIT and lives in New York, where she enjoys rock climbing, escape the rooms, and escaping the city.