San Francisco - February 18, 2016
Matt Adereth on A Scalable Bootstrap for Massive Data
Bootstrapping is a powerful statistical technique for assessing the quality of estimators. It's computationally intensive and it's not immediately obvious how it could be applied efficiently in a distributed environment.
We'll go through a history of computational methods for assessing estimator quality from the Jackknife Method (1949) to today, explaining why you want to do this and assuming only basic statistical knowledge. While the paper gets pretty hot and heavy with the math, we'll keep it light.
I love this paper because it's a look at how the world of statistics has to adapt to the new reality of distributed compute.
Matt builds tools and infrastructure for quantitative research at Two Sigma. He previously worked at Microsoft on Visio, focusing on ways to connect data to shapes In his spare time, he builds ergonomic keyboards using Clojure.
Marios Assiotis on Throttling Utilities in the IBM DB2 Universal Database Server
This paper describes a control system that provides the “utilities throttling” feature in the IBM°R DB2°R Universal DatabaseTM v8.1.
Administrative utilities (e.g., filesystem and database backups, antivirus scan) are essential to the operation of production systems. Unfortunately, production work can be severely degraded by the concurrent execution of such utilities. Hence, it is desirable for the system to self-manage its utilities to limit their performance impact, with only high-level policy input from the administrator. We focus on policies of the form “There should be no more than an x% degradation of production work due to utility execution.”
Marios is the CTO at TubiTV, the world's largest free streaming TV & movie library. His interests include simplifying complex systems, storage and low latency network i/o at scale. A transplant from Cyprus, he spends his free time trying to create the perfect all-American burger.