## Chapter Meetups

# SWIM: Scalable Group Membership Protocol

**Schedule: 6:00-6:30** Food and Socializing

**6:30-7:30**Presentation and Discussion

**The Paper**

The problem of group membership in a networked distributed system can be expressed as the ability to reliably and repeatedly provide the answer a simple question: "who is alive?".

Group membership represents a key part of many useful applications, including P2P systems, network overlays, service discovery, and HA clustering. The SWIM protocol addresses severe limitations in the scalability of existing heartbeat-based broadcast membership algorithms by introducing a new approach using weakly-consistent, gossip based information sharing.

We'll start by working to understand the general problem space of group membership. Then we'll walk through the details of heartbeat-based membership algorithms and discuss their limitations when it comes to scalability. Once we have a grounding in the domain, we'll talk about the details of the SWIM protoc…

# Hokusai — Sketching Streams in Real Time

**The Paper:**A description of 北斎 Hokusai, a real time system which is able to capture frequency information for streams of arbitrary sequences of symbols. The algorithm uses the CountMin sketch as its basis and exploits the fact that sketching is linear. It provides real time statistics of arbitrary events, e.g. streams of queries as a function of time. We use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. Queries can be answered in constant time.

http://www.auai.org/uai2012/papers/231.pdf

**The Speaker:**Hailing from Brooklyn, Shaker is a crazy-looking dude who engineers crazy things at Belly (previously, The Onion). He’s sometimes on The Twitter, so Twitter at him at www.twitter.com/@shakerdev

**Schedule:**

# An Empirical Study of the Naive Bayes Classifier

**The Paper: **Naive Bayes classifiers are based upon Bayes theorem, a probability and stats theory that relates current probability to prior probability. A central assumption of the Naive Bayes classifier is the assumption that each predictor, or piece of evidence observed, is independent of other predictors. Given this simplicity (the 'naive' assumption) Naive Bayes classifiers have been shown to outperform other more sophisticated classifiers. We'll review what types of data attributes Rish argues can impact the performance of Naive Bayes as well as review an implementation in Python I wrote.

http://www.cc.gatech.edu/~isbell/reading/papers/Rish.pdf

**The Speaker:**

Given that Lorena was a policy analyst what is the likelihood that yes she would want to learn about modeling data? (The overw…

# ZooKeeper: Wait-free coordination for Internet-scale systems

**The Paper:**

The Zookeeper paper describes a toolkit for building coordinated distributed systems, which is itself built upon lower-level concepts like consensus, linearizability, and atomic broadcast. Though we sometimes think of tools like distributed locks, rendezvous, group membership, and barriers as primitives, they actually can be, and are often in practice, built on top of Zookeeper. We'll dig into what makes Zookeeper tick, and why it's useful for real-world distributed systems.

The paper can be found here.

**The Speaker:**

If Colin Jones loves this paper so much, why doesn't he marry it? Suspicious. And not that it helps his case, but he writes software for 8th Light and recently published a book on

# Recursive Functions of Symbolic Expressions and Their Computation by Machine

**The Paper:** As the first published paper describing the Lisp language, John McCarthy's 1960 paper "Recursive Functions of Symbolic Expressions..." introduces some familiar, and some maybe not familiar concepts of Lisp in their earliest form. This talk will take a look at concepts like M-expressions, S-expressions, functions, and forms, and will walk through McCarthy's definitions of core Lisp functions like apply and eval. We'll also discuss the paper's description of the computer representation and translation of Lisp and recursive functions.

The paper can be found here.

**The Speaker:**

Kevin Buchanan is a software developer at 8th Light, where he currently enjoys getting to write Lisp with Clojure every day.

# Probabilistic algorithm for testing primality

**The Paper:**

The probabilistic test for primality presented in Michael Rabin's 1977 paper is interesting because prime numbers are at the center of several high-profile algorithm problems, and because probabilistic "proofs" in general challenge naive notions of mathematical certainty. This talk will start by motivating the problem of primality testing, describe the algorithm, and do as much as is humanly possible to make the correctness proof accessible to people who don't read about number theory in their free time.

The paper can be found here: http://www.sciencedirect.com/science/article/pii/0022314X80900840

**About Our Speaker:**

Gary Fredericks once stood up for the entirety of a two-hour Greyhound ride because there weren't any available seats because at that point Greyhound didn't guarantee seats with your ticket and you just had to wait fo…

# Packrat Parsing: a Practical Linear-Time Algorithm with Backtracking

**The paper:**

Bryan Ford's 2002 Masters Thesis is remarkable in that it breaks decades of compiler-construction dogma with some simple principles and a compelling alternative to the complexity of parsing with context-free grammars (CFGs). He reveals a forgotten class of grammars-- top-down parsing language (TDPL), and some extensions known as parsing-expression grammars (PEGS) -- that directly correspond to the parsers that implement them. His primary contribution, however, is applying modern functional programming techniques of laziness and algebraic data structures to make TDPL/PEG parsers computationally efficient.

The paper can be found here: http://bford.info/pub/lang/thesis.pdf

**About our speaker:**

Sean Cribbs is a Software Engineer at Basho Technologies, where he has contributed to many aspects of Riak, the distributed …