San Francisco Chapter
What was the last paper within the realm of computing you read? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research publication with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper. Otherwise, just come, listen, learn, and discuss.
We'll be using papers-we-love's curated repository. Please contribute by adding PR's for papers, code, and/or links to other repositories.
PWL SF strictly adheres to the Code of Conduct set forth by all PWL charters.
Chapter details
Location: Fastly - 475 Brannan Street #320, San Francisco, CA
Sign-up: Please RSVP for meetings via Meetup.com
Twitter: @papers_we_love
Organizers: Ines Sombra and Elaine Greenberg
Sponsors
Chapter Meetups
Benjamin Goering on "ActivityPub W3C Recommendation"
Agenda:
https://docs.google.com/document/d/11gcv6R70uGqKw8t6nHRmhrXakCKj3clECqj0KAMDVqQ/
Main:
Benjamin Goering on "ActivityPub W3C Recommendation"
Bio:
Ben has been studying and evangelizing decentralized social networking for 15 years since the early days of blogging and podcasting. He worked on social-media-as-a-service startup Livefyre from[masked] with customers from cnn.com to ign.com to foxnews.com, and contributed as an Invited Expert to the W3C Social Web WG that helped standardize activitystreams 2.0 and activitypub as technical recommendations. Now he works at Protocol Labs on DIDs and web3.storage.
Program committee selection and Aaron D Goldman on "Liquid Data Networking"
**The time has come to select the next Papers We Love SF program committee!**
**As a program committee member you get to have a say in what papers we present and learn how to run the organization.**
**This meeting will open with a mini-talk from Aaron Goldman on Liquid Data Networking.**
**After that we will take nominations for and vote on the program committee.**
**Come join us and express your opinion on the future direction of Papers We Love!**
**Anyone present can vote for the program committee.**
**Liquid Data Networking**
**[ICN '20: Proceedings of the 7th ACM Conference on Information-Centric Networking](https://dl.acm.org/doi/proceedings/10.1145/3405656)**
**September 2020 Pages 129–135**
**[https://doi.org/10.1145/3405656.3418710](https://d…
Yao Yue on "The Tail at Scale"
Agenda:
https://docs.google.com/document/d/1gNpVmv-rFUtktCl-Awi5WJIY_f7nPkkngF0rAiPtQI0
Main:
Yao Yue on
The Tail at Scale
Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.
By Jeffrey Dean and Luiz André Barroso
Bio: Yao Yue is an engineer and manager working at Twitter Platform. She has been working on distributed cache since 2010, with extensive experience with popular OSS projects such as Memcached and Redis. She designed and implemented a modular open-sourced cache framework called Pelikan (more at pelikan.io). Since 2017, she has started and managed the Infrastructure Performance and Optimization team. Her team work on infrastructure performance and capacity monitoring, optimizing systems configurations and utilization, cross-service tracing an insight at scale, and advanc…
Natalie Telis on "College Admissions and the Stability of Marriage"
Agenda: https://docs.google.com/document/d/1AjJfeXW37bX7dHxYrPXCS8F1f7ezl_rO0qtDv82vMiU
Mini
Shachaf on
Optimal Speedup of Las Vegas Algorithms
Michael Luby Alistair Sinclair David Zuckerman
https://www.cs.utexas.edu/~diz/pubs/speedup.pdf
Main
Natalie Telis on
College Admissions and the Stability of Marriage
D. Gale and L. S. Shapley
https://www.eecs.harvard.edu/cs286r/courses/fall09/papers/galeshapley.pdf
Bio: Natalie Telis is a mathematician by training and a biologist by trade. She did her PhD work at Stanford, developing methods to study the connection between human diversity, human history and disease risk. In her spare time, she applie…
Aaron D Goldman on "Hashed and Hierarchical Timing Wheels"
*Main*
Presenter: Aaron D Goldman
Aaron is a security engineer at Twitter with a history of protecting distributed systems and microservices. He has previously worked on radar systems for the US Department of Defense, Anti Abuse at Google, and Runtime Application Self Protection at tCell. In his spare time, he is building a new internet out of immutable data.
Paper
"Hashed and Hierarchical Timing Wheels: Efficient
Data Structures for Implementing a Timer Facility"
http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
Zoom Meeting details:
Nikilesh Subramoniapillai Ajeetha is inviting you to a scheduled Zoom meeting.
Topic: Aaron D Goldman on "Hashed and Hierarchical Timing Wheels"
Time: Jan 21,[masked]:00 PM Pacific Time (US and Canada)
Join Zoom Meeting
Jon Moroney on ". . . An Empirical Analysis of Email Delivery Security"
*Mini*
Presenter: Zephyr Pellerin
Zephyr Pellerin is a security engineer at the threat intelligence firm Polyswarm. He has implemented and evaluated a variety of methods of calibrating and aggregating uncertain expert verdicts on malware, such as the one described in this paper.
Paper
"Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels"
https://dl.acm.org/doi/pdf/10.1145/2808769.2808780
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors’ detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTota…
David Murray on "Time, Clocks, and the Ordering of Events"
Paper
Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport, Massachusetts Computer Associates, Inc.
https://lamport.azurewebsites.net/pubs/time-clocks.pdf
The concept of one event happening before another
in a distributed system is examined, and is shown to
define a partial ordering of the events. A distributed
algorithm is given for synchronizing a system of logical
clocks which can be used to totally order the events.
The use of the total ordering is illustrated with a
method for solving synchronization problems. The
algorithm is then specialized for synchronizing physical
clocks, and a bound is derived on how far out of
synchrony the clocks can become.
Presenter bio
David Murray
David is a big fan of papers, cats, and the oxford comma. He works as an architect at Salesforce from his home in Seattle, wi…
Aaron D Goldman on "Spritz—a spongy RC4-like stream cipher and hash function"
Mini (TBD)
Main
Presenter bio
Aaron D Goldman
Security engineer
Aaron is a security engineer at Twitter with a history protecting distributed systems and microservices. He has previously worked on radar systems for the US Department of Defense, Anti Abuse at Google, and Runtime Application Self Protection at tCell. In his spare time he is building a new internet out of immutable data.
Paper
Spritz—a spongy RC4-like stream cipher and hash function
This note reconsiders the design of the stream cipher RC4, and proposes an improved variant, which we call “Spritz” (since the output comes in fine drops rather than big blocks.)
https://github.com/AaronGoldman/spritz/blob/master/RS14.pdf
Zoom Meeting Details:
Topic: Aaron D Goldman on "Spritz—a spongy RC4-like stream cipher and hash function"
Time: Oct 22,[masked]:30 PM Pacif…
Bruce Spang on "The Detection of Defective Members of Large Populations"
Mini
Johnathan Chiu on TBD
Johnathan's Bio
Johnathan is an undergraduate at UC Berkeley studying Electrical Engineering and Computer Science (EECS). His interests are in robotics, data compression, and immersive technology. He is currently doing research on Neural Network performance at Berkeley.
Main Talk
---
Bruce Spang on "The Detection of Defective Members of Large Populations"
[Dorfman 43] Dorfman, Robert. The Detection of Defective Members of Large Populations. Ann. Math. Statist. 14 (1943), no. 4, 436--440. doi:[masked]/aoms/[masked]. > https://projecteuclid.org/euclid.aoms/1177731363
Some relevant papers
* Robert Dorfman. “The Detection of Defective Members of Large Populations” (https://projecteuclid.org/euclid.aoms/1177731363)
* William Kautz, Richard Singleto…
Nyah Check on Serverless Computing: One step forward and two steps backward
Mini
---
Soon
Main Talk
---
Serverless Computing: One step forward and two steps backward
http://cidrdb.org/cidr2019/papers/p119-hellerstein-cidr19.pdf
Abstract:
Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current serverless offerings a bad fit for cloud innovation and particularly bad for data systems innovation. In addition to pinpointing some of the main shortfalls of current serverless architectures, we raise a set of challenges we believe must be met to unlock the radical potential that the cloud—with i…
Jana Iyengar on "The death of an end-to-end internet (and a way forward)"
Mini
Andre Arko on Why Are The Prices So Damn High.
Andre tells us: "This is a paper that examines the very high and continuously climbing costs of education and health care in the United States. This paper is especially interesting as a follow-up to last month’s PWL Mini on Baumol’s cost disease"
https://www.mercatus.org/publications/healthcare/why-are-prices-so-damn-high.
Bio:
André Arko is a software consultant at cloudcity.io, and founder of the software non-profit rubytogether.org. He graduated from college before he could drink, has written open source installed hundreds of millions of times, and lives in a shipping container that overlooks SFO. And at least two of those things are true.
Main Talk
Jana Iyengar on "The death of an end-to-end internet (and a way forward)"
Over the past two decades, the Internet’s runaway s…
Jessie Frazelle on A Tale of Two Papers
Mini
Kevin Burke on Baumol’s Cost Disease - On The Performing Arts: The Anatomy of their Economic Problems - http://people.stern.nyu.edu/wbaumol/OnThePerformingArtsTheAnatomyOfTheirEcoProbs.pdf
Kevin's Bio
Kevin Burke (https://twitter.com/derivativeburke) ( https://burke.services ) likes building great experiences. He helped scale Twilio and Shyp, and currently runs a software consultancy. Kevin once accidentally left Waiting for Godot at the intermission.
Main Talk
Jessie Frazelle on A Tale of Two Papers
"It was the best of times, it was the worst of times." Come dive into two papers covering datacenter outages: "Maelstrom: Mitigating Datacenter-level Disasters by Draining Interdependent Traffic Safely and Efficie…
Zach Tellman on Shape Decomposition for Multi-channel Distance Fields
Mini
Ramon Nogueira on BlinkDB: Queries with Bounded Errors and
Bounded Response Times on Very Large Data - https://sameeragarwal.github.io/blinkdb_eurosys13.pdf
Ramon's Bio
Ramon is a software engineer with a passion for making large systems easier to understand and operate. He currently works at Google on OpenCensus: an open source metrics and distributed tracing library for microservices. Previous hits include iCloud storage APIs at Apple, and startups in London and Johannesburg
Main Talk
Zach Tellman on Shape Decomposition for Multi-channel Distance Fields - https://dspace.cvut.cz/bitstream/handle/10467/62770/F8-DP-2015-Chlumsky-Viktor-thesis.pdf
Zach's Bio
Zach consults on the design of distributed systems and APIs. He has written "E…
Gwen Shapira on Peeking Behind the Curtains of Serverless Frameworks
Mini
Michael Kehoe on Democratically Finding The Cause of Packet Drops - https://arxiv.org/pdf/1802.07222.pdf
Michael's Bio
Michael Kehoe is a Staff SRE at LinkedIn who works on building scalable monitoring infrastructure, reliability principles and incident management. Michael previously interned at NASA Ames on their PhoneSat project. Michael's key interests lie in network engineering and automation.
---
Main Talk
Gwen Schapira on Peeking Behind the Curtains of
Serverless Frameworks - https://www.usenix.org/system/files/conference/atc18/atc18-wang-liang.pdf
Gwen's Bio
Gwen Shapira is a principal data architect at Confluent,
previously an engineer at Cloudera. She's committer of the Apache
Kafka project and author of "Kafka - the definitive guide".
Aaron Goldman on Chord
Mini
Max Seiden on Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
https://arxiv.org/pdf/cs/0701155.pdf
In under 15 pages, this paper describes the first principles of OLAP with such clarity and foresight that, over 20 years later, it's direct contributions are still highly relevant to anyone building products and technologies in the analytics space.
Max's Bio
Max is a systems engineer with a deep interest in making information accessible to anyone with a question. He's currently at Sigma Computing, where he spends most of his time hacking on query compilation, optimization, and code generation. Before Sigma, he was at Platfora (now Workday) working on data cubes and materialized views using Spark, MapReduce, and the bundle-of-joy that is the Hadoop Ecosystem. Max received a bachelors in computer science from the University of Michigan. When…
Sean T. Allen on Life Beyond Distributed Transactions: An Apostate’s Opinion
Main Talk
In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, I'll show that the ideas are far more widely applicable, particularly in scaling stateful applications. In particular, how we implemented some of the ideas from the paper in our distributed stream processor Wallaroo.
https://queue.acm.org/detail.cfm?ref=rss&id=3025012
---
Sean T. Allen is VP of Engineering at Wallaroo Labs and a member of the Pony core team. His turn-ons include programming languages, distributed computing, Hiwatt amplifiers, and Fender Telecasters. His turn-offs include mayonnaise, stirring yogurt, and sloppy code. He is one of the authors of Storm Applied.
---
Scott Vokes on An Efficient Context-Free Parsing Algorithm
Mini
Ori Berenstein on The Slab Allocator https://www.usenix.org/legacy/publications/library/proceedings/bos94/full_papers/bonwick.a
Ori's Bio
Ori was once evicted from the womb. The experience was unpleasant, and he has never forgiven the world for it. Today, he spends most of his time waving fingers over keyboards in order to inject magic into microwaves. Follow Ori at https://twitter.com/oribernstein
---
Main Talk
Scott Vokes on "An Efficient Context-Free Parsing Algorithm" by Jay Earley (Communications of the ACM, 1970)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.1808
Scott tells us: This paper introduces the Earley algorithm, which is a fully…
David Calavera on The QUIC Transport Protocol
Mini
Omoju Miller on "All The Cool Kids, How Do They Fit In? Popularity and Demographic Biases in RecommenderEvaluation and Effectiveness" (http://proceedings.mlr.press/v81/ekstrand18b/ekstrand18b.pdf)
Omoju Bio
Omoju is a Senior Machine Learning Data Scientist with Github. She has over a decade of experience in computational intelligence. In the past, she has co-led the non-profit investment in Computer Science Education for Google and served as a volunteer advisor to the Obama administration’s White House Presidential Innovation Fellows.
---
Main Talk
David Calavera on "The QUIC Transport Protocol: Design and Internet-Scale Deployment": https://research.google.com/pubs/pub46403.html
David's Bio
David Calavera is the CTO of Netlify, where he and his team are building th…
Cathie Yun on Bulletproofs: Short Proofs for Confidential Transactions and More
Mini
Alan Karp on Comparing Information Without Leaking It. https://www.stat.berkeley.edu/users/aldous/157/Papers/fagin.pdf
Alan's Bio
Alan got a Ph.D. in Astronomy and was an assistant professor of physics at Dartmouth until he figured out his job was that of a small businessman whose money came from writing grant proposals. After that, he did 15 to life at IBM doing large scale scientific and parallel computing. He then joined HP Labs, where he worked on a variety of projects including being one of the architects of the HP/Intel Itanium processor; E-speak, a platform that was called "web services before there were web services;" Polaris, a virus safe computing environment for Windows; and several other demonstrations that systems can be made more functional and more usable by adding security. After 20+ years, HP Labs came to its senses and kicked him out, but he bamboozle…
Kolton Andrus On Designing and Deploying Internet Scale Services
Mini
Peter Bourgon on CASPaxos: Replicated State Machines without logs
https://arxiv.org/pdf/1802.07000.pdf
Peter's Bio
Peter Bourgon is a distributed systems engineer who has seen things. He's the author of Go kit, a toolkit for microservices; and OK Log, a distributed logging system. He's currently driving the engineering observability initiative within Fastly.
Main Talk
Kolton Andrus On Designing and Deploying Internet Scale services https://www.usenix.org/legacy/event/lisa07/tech/full_papers/hamilton/hamilton.pdf
Kolton's Bio
Kolton (https://twitter.com/KoltonAndrus) is co-founder and CEO of Gremlin. Previously he was a Chaos Engineer at Netflix improving streaming reliability and operatin…
Matt Adereth on Distributed black-box optimization techniques
Mini
Kavya Joshi on "Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services" - https://research.fb.com/publications/kraken-leveraging-live-traf%EF%AC%81c-tests-to-identify-and-resolve-resource-utilization-bottlenecks-in-large-scale-web-services/
Kavya's Bio
Kavya (https://twitter.com/kavya719) writes code for a living at a start-up in San Francisco. She particularly enjoys architecting and building highly concurrent, highly scalable systems. In her free time, she reads non-fiction and climbs rocks. Before moving to San Francisco to be an Adult, Kavya was at MIT where she got a Bachelor's and Master's in Computer Science.
Main Talk
Matt …
Dave Cheney on What Have We Learned from the PDP-11?
Mini
Aaron Goldman on Hash Array Mapped Trie (http://lampwww.epfl.ch/papers/idealhashtrees.pdf)
Aaron's Bio
Aaron David Goldman did his graduate work at the Georgia Institute of Technology where he studied Electrical and Computer Engineering. He has worked on radar systems for the US Department of Defense, Anti Abuse at Google, and is currently working in Runtime Application Self Protection at tCell. In his spare time he is building a new internet out of immutable data.
Main Talk
Dave Cheney (https://twitter.com/davecheney) on What Have We Learned from the PDP-11? (https://gordonbell.azurewebsites.net/CGB%20Files/What%20Have%20We%20Learned%20From%20the%20PDP-11%201977%20c…
Bryan Cantrill on ARC: A Self-Tuning, Low Overhead Replacement Cache
Main Talk
Bryan Cantrill on "ARC: A Self-Tuning, Low Overhead Replacement Cache" by Nimrod Megiddo and Dharmendra Modha ( https://www.usenix.org/legacy/event/fast03/tech/full_papers/megiddo/megiddo.pdf )
Bryan's Bio
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform. Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system software, from the guts of the kernel to client-code on the browser. In particular, he co-designed and implemented DTrace, a facility for dynamic instrumentation of production systems that won the Wall Street Journal's top Technology Innovation Award in 2006 and the USENIX Software Tools User Group Award i…
Dave Cohen on Hashgraph Consensus: Fair, Fast, Byzantine Fault Tolerance
Mini
J. Paul Reed on "Trade-offs Under Pressure: Heuristics and Observations of Teams Resolving Internet Service Outages" (http://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=8084520&fileOId=8084521 )
Paul's Bio
J. Paul Reed has over fifteen years experience in the trenches as a build/release engineer, working with such storied companies as VMware, Mozilla, Postbox, Symantec, and Salesforce.
In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's worked across a number of industries, from financial services to cloud-based infrastructure to health care, with teams ranging from 2 to 2,…
Scott Andreas on Overlapping Experiment Infrastructure
Mini
André Arko on Robin Hood Hashing
• https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf
• https://www.dmtcs.org/pdfpapers/dmAD0127.pdf
André Arko leads the Bundler and RubyGems teams, co-authored The Ruby Way, and blogs at <a>http://arko.net.</a>; He works at Cloud City as a software development consultant, and founded Ruby Together, a non-profit that pays for work on Ruby open source
Main Talk
Kevin Burke on "Curve25519 and fast public key cryptography"
Mini
Tyler McMullen on Delta CRDTs
Tyler will do his best to summarize and get you hooked on the three papers listed below:
• https://arxiv.org/pdf/1410.2803.pdf
• https://arxiv.org/pdf/1603.01529.pdf
• http://dl.acm.org/citation.cfm?id=2911163
Tyler's Bio
Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text analysis and recomm…
Peter Geoghegan on "Query Evaluation Techniques for Large Databases"
***okta has kindly volunteered to host us. They do however ask that you sign an NDA at the door. If this is an issue for you, please let us know in advance.
Mini
Aish Raj Dahal on "Cuckoo Filter: Practically Better Than Bloom" (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Aish's Bio
Aish is a software engineer with a passion for distributed systems. He currently works for PagerDuty, building reliable event driven systems. In the past life, he was at Wall Street building software platforms for high performance trade execution. He was also a maintainer for KDE's KGet project.
Main Talk
Peter Geoghegan on "Query Evaluation Techniques for Large Databases"
Peter tells us: Thi…
Ben Sigelman on Pivot Tracing
Mini
Lukasz Jagiello on Fast Inverse Square Root - http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
Lukasz's Bio
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between NO and NO he focus his work at modern approach to monitoring and distributed storage
Main Talk
Ben Sigelman on Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems (http://pivottracing.io/mace15pivot.pdf)
Ben's Bio
</a><a href="https://twitter.com/el_bhs">Ben Sigelman is the cofounder and CEO of
Yifan Wu on Reactive Vega
Mini
Kiran Bhattaram on HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf)
Kiran's Bio
Kiran Bhattaram loves making things, whether tinkering with circuits, writing software systems, or sewing dresses. She works on Stripe’s infrastructure team, and has previously built things for the New York Times, LinkedIn and MIT CSAIL.
Main Talk
Yifan Wu on "Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization" (
Tom Santero on DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker
Mini
Matt Adereth on the January 1965 issue of The Computer Journal.
This issue contains one of the most important techniques in numerical optimization, the Nelder-Mead simplex method. An entire full-length talk could be dedicated to it, but instead we’re going to try and understand the historical context by looking at everything else in the journal, from the other papers to the letters to the editor to the advertisements.
Matt's Bio
Matt builds tools and infrastructure for quantitative research at Two Sigma. He previously worked at Microsoft on Visio, focusing on ways to connect data to shapes In his spare time, he builds ergonomic keyboards using Clojure.
Main Talk
Tom Santero on DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker - https://arxiv.org/abs/…
Pat Helland on "Immutability Changes Everything"
**Please note that this month's location is not optimal for ADA. Please let us know your needs and we will do our best to accommodate you.
Mini
Daniel "Spoons" Spoonhower on "A Unified Theory of Garbage Collection"
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
Let's take a deep dive into language implementation: garbage collectors are a great tool for improving programmer productivity… until something goes wrong and you find yourself endlessly tuning collector configuration parameters. This paper takes a principled approach to understanding different garbage collection algorithms and offers something useful for both language implementors and programmers that use garbage-collected languages.
Spoon's Bio
Spoons is a co-founde…
Caitie McCaffrey on "Distributed Programming in Argus"
Mini
Kevin Burke on "Feral Concurrency" http://www.bailis.org/papers/feral-sigmod2015.pdf
Kevin's Bio
Kevin Burke (https://kev.inburke.com) likes building great experiences. He helped scale Twilio and Shyp, and currently runs a software consultancy. Kevin once accidentally left Waiting for Godot at the intermission.
Main Talk
Caitie McCaffrey on "Distributed Programming in Argus" by Barbara Liskov
https://people.csail.mit.edu/alinush/6.824-spring-2015/papers/argus88.pdf<…
Peter Alvaro on "Causes and Explanations: A Structural-Model Approach"
Mini
Yifan Wu on "Real Time Groupware as a Distributed System: Concurrency Control and its Effect on the Interface"by Saul Greenberg David Marwood" ( https://pdfs.semanticscholar.org/cf3c/135df03e455be1e8a64e5af6f19d2ff3ee2f.pdf )
This 90’s paper pioneers the idea that a front-end interface is a distributed system. As the UI becomes more real time and collaborative, we are starting to see a lot of anomalies in our day to day application experiences — this paper will shed light on a more structured way to reason about concurrency on the front-end.
Yifan's Bio
Yifan is a graduate student at UC Berkeley researching topics at the intersection of databases and human computer interaction, currently inve…
Diego Ongaro on The criteria to be used in decomposing systems into modules
Mini
Eitan Adler on Program development by stepwise refinement ( https://www.inf.ethz.ch/personal/wirth/Articles/StepwiseRefinement.pdf )
Eitan's Bio
Eitan Adler is software engineer with a passion for distributed systems, security, and open source software. Currently employed by Twitter he spends his days improving developer tooling to make the lives of other software engineers easier.
Main Talk
Diego Ongaro on "On the criteria to be used in decomposing systems into modules"
Bryan Fink on "A Brief History of NTP Time: Memoirs of an Internet Timekeeper"
Mini
Gareth Morgan on Physically-Based Shading at Disney (https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf)
In recent years the adoption of physically based shading has been responsible for big advancements in the quality of 3D graphics (both real time and production graphics). This paper is an overview of how Disney Animation Studios pioneered physically based shading. It is a great overview of the field of PBS and BRDF theory generally.
Gareth's Bio
Gareth Morgan has been involved in games and 3D graphics since 1999, starting at Silicon Graphics followed by several games companies including Activision and BAM Studios. For man…
Tony Arcieri on A Protocol for Interledger Payments
Mini
Amanda Gilmore on How Complex Systems Fail (http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf) by Richard Cook
Amanda's Bio
Amanda Gilmore is a site reliability engineer at Heroku, and has worked with complex distributed systems for most of her career. She currently specializes in database reliability and previously worked as a QA engineer, exposing her to a myriad of interesting ways that technical systems can fail.
Main Talk
Tony Arcieri on A Protocol for Interledger Payments:
Armon Dadgar on Vivaldi: Decentralized Network Coordinate System
Mini
Tom Faulhaber on "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" (https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf) which is a foundational paper in the use of distributed hash tables.
Tom's Bio
Tom Faulhaber is principal of Infolace (www.infolace.com), a San Francisco-based consultancy that helps clients from startups to global brands turn raw data into information and information into action. Throughout his career, Tom has developed systems for high-performance TCP/IP, large-scale scientific visualization, energy trading, and many more.
In addition, Tom is a contributor to the Cl…
Paul Borrill on Time clocks and the reordering of events
Mini
Adrian Cockcroft on Communicating Sequential Processes (http://spinroot.com/courses/summer/Papers/hoare_1978.pdf)
Main Talk
Paul Borrill on Lamport’s unfinished revolution.
This talk reviews Lamport’s seminal 1978 paper on Time, Clocks and the Ordering of Events, the 2nd most cited paper in all of computer science.
Almost all software engineers claim to have read it. Many who haven’t read it, use (and basically understand) the fundamental idea of logical clocks, and their progeny (vector clocks, matrix clocks, etc.). More than a few understand the current state of the art: dotted version vectors and bounded version vectors. Paradoxically, almost everyone misse…
Kiran Bhattaram on A Mathematical Theory of Communication
Mini
Lukasz Jagiello on “pASSWORD tYPOS and How to Correct Them Securely” (https://www.cs.cornell.edu/~rahul/papers/pwtypos.pdf)
Lukasz tells us: “typo-tolerant password authentication for arbitrary user-selected passwords” sounds like a really bad security joke but if we combine that with metrics where almost 10% of failed login attempts fail due to a handful of simple, easily correctable typos, such as capitalization errors. Authors proves it is possible to improve user experience with really low impact on security.
I really enjoy this paper because it’s not a standard security approach and in many places it’s a reasonable tradeoff between security and UX.
Lukasz's Bio
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between NO…
Gilbert Bernstein on Spacetime Constraints
Mini
Tyler McMullen on Similarity Estimation Techniques from Rounding Algorithms
http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf
Tyler's Bio: Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, he has experience in everything from web design to kernel development, and loathes all of it. Especially distributed systems.
Main Talk
João Taveira on Congestion Control
Mini
Joel VanderWerf on The Krohn-Rhodes Theorem and Distributed Computing ( http://www.ams.org/journals/tran/1965-116-00/S0002-9947-1965-0188316-1/S0002-9947-1965-0188316-1.pdf )
The Krohn-Rhodes Theorem of 1962 surprised the math world by building arbitrary finite semigroups, and hence arbitrary finite state machines, out of flip-flops (registers with reset operations) and groups (permutations). The construction uses the wreath product, a coordinate system with unidirectional data flow among the factors, like addition with carry. How many group factors are needed? We don't know if this complexity question is decidable, after a half century of work. However, the effort …
Caitie McCaffrey on "Sagas"
Mini
Gilbert Bernstein on Marching Cubes (http://www.eecs.berkeley.edu/~jrs/meshpapers/LorensenCline.pdf )
Marching Cubes is one of the most important geometry algorithms for 3D volume visualization, 3D scanning/reconstruction, etc. It has the distinction of being the most cited graphics paper ever. And it's also definitely not the best algorithm you could implement for the problem it solves. Intriguing?
Gilbert's Bio
Gilbert Bernstein is a Ph.D. student in the department of Computer Science at Stanford University. His work focuses on a range of topics across Computer Graphics, HCI and Programming Languages, including Domain-Specific (Programming) Languages, Visual Tools for Artists…
Matt Adereth on "A Scalable Bootstrap for Massive Data"
Mini
Marios Assiotis on "Throttling Utilities in the IBM DB2 Universal Database Server" (http://www.nt.ntnu.no/users/skoge/prost/proceedings/acc04/Papers/0354_ThA01.3.pdf)
Marios' Bio
Marios is the CTO at TubiTV, the world's largest free streaming TV & movie library. His interests include simplifying complex systems, storage and low latency network i/o at scale. A transplant from Cyprus, he spends his free time trying to create the perfect all-American burger.
Main Talk
Matt Adereth from Two Sigma will present "A Scalable Bootstrap for Massive Data" (
Henry Robinson on "No compromises: distributed transactions with consistency.."
Mini
Bryan Fink on "Fluctuations of Hi-Hat Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording" (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902)
Bryan's Bio:
Bryan hacks distributed systems by day, and does almost anything else by night. His interests in percussion and computers began nearly coincidentally over twenty years ago in a small town on the Great Plains. The combination has led to him having strange thoughts about time and coordination.
Main Talk
Henry Robinson from Cloudera will present the paper "No compromises: distributed transactions …
Nathan Taylor on OS scalability & Chris Meiklejohn on Chain Replication
****** We are closing the year with a PWL marathon ******
Talk #1 - Nathan Taylor on " Corey: An Operating system for Many Cores" (https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf) and "An analysis of Linux scalability to many cores" (https://pdos.csail.mit.edu/papers/linux:osdi10.pdf)
From Nathan:
This is a story that spans two low-level systems papers. While on the surface it's all about how to make operating systems scale, it's also a story about how the same researchers can tackle a pr…
PWL#21 => Gareth Morgan on The Rendering Equation
Mini
Tony Arcieri on Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud - http://theory.stanford.edu/~ataly/Papers/macaroons.pdf
Tony is a @SquareEng CyberSecurity Engineer. He cybertweets about cybercrypto and cyberprivacy. Cyberinventor of@celluloidrb and @TheCryptosphere.
Main Talk
Gareth Morgan on The Rendering Equation (
PWL#20 => Aysylu Greenberg on "Probabilistic Accuracy Bounds"
PWL Mini
Jeff Carpenter presents Design Principles Behind Smalltalk by Daniel H. H. Ingalls. Jeff tells us: This a paper I love because it frames programming language design as a means to "provide computer support for the creative spirit in everyone." The paper describes the design principles the Learning Research Group at PARC discovered as they evolved the design of the Smalltalk language.
PDF: https://github.com/papers-we-love/papers-we-love/blob/master/smalltalk/Design-Principles-Behind-Smalltalk.pdf
Jeff's Bio
Jeff is a software engineer at Braintree where he works on the JavaScript SDK and various payment services. He is particularly interested in…
PWL#19 => Jason Brown on Epidemic Algorithms for Replicated Database Maintenance
PWL Mini
Stephen Tu presents: "Random features for large-scale kernel machines" (http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf) by Ali Rahimi and Ben Recht
Kernel methods in machine learning are a popular tool used to express richer function classes. These methods work via access to a kernel function, which can be thought of as a measure of similarity between two vectors. The standard method of optimization via kernels requires solving a program with decision variable the size of the number of data points. As dataset sizes grow, this becomes prohibitive.
This paper addresses this issue in a surprisingly practical way. Specifically, the authors show that by using a multiple of d random projections based on the Fouri…
PWL#18 => Bob Poekert on COOLCAT
Mini
David Brockman on "What uncovering a massive academic fraud taught me about how academia needs to change". A bit of background
http://nymag.com/scienceofus/2015/05/how-a-grad-student-uncovered-a-huge-fraud.html
Here's what David will be presenting: https://drive.google.com/file/d/0B_Qj0otlErJqVlJtMUhTU3ZiRzQ
David's Bio:
David Broockman is an Assistant Professor of Political Economy at the Stanford Grad…
PWL#17=> Ben Sigelman on Spanner: Google’s Globally-Distributed Database
Mini
Kelsey Gilmore-Innis on "Information Escrows by Ian Ayres & Cait Unkovich" (http://repository.law.umich.edu/cgi/viewcontent.cgi?article=1091&context=mlr)
Kelsey is the Director of Technology at Sexual Health Innovations, where they are currently building Callisto (www.projectcallisto.org) based on Ayers & Unkovich's paper. She'll be talking about the joys and pitfalls of going from academic paper to production code which should ring familiar whether your source is Microsoft Research or the Michigan Law Review.
Main Talk
Ben Sigelman will present Span…
PWL#16=>Sargun Dhillon on Facebook Haystack
Mini
Kyle Isom on "Out of the Tarpit" by Ben Moseley and Peter Marks (http://shaffner.us/cs/papers/tarpit.pdf)
About the paper: Software inevitably grows complex as it grows in scope. Out of the Tar Pit puts forward some useful, actionable ideas for how we can manage complexity in our programs by looking at how we can reduce state in our systems.
Kyle is a systems engineer in the Bay area
Main Talk
Sargun will present the "Facebook Haystack" by Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel (http://static.usenix.org/legacy/events/osdi10/tech/full_papers/Beaver.pdf).
Sargun tells us: "It presents a simpl…
PWL#15 => Devon O'Dell - Nonblocking Algorithms & Scalable Multicore Programming
PWL Mini
Clark Breyman on "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information" (http://psychclassics.yorku.ca/Miller/)
From Clark:"Through a CS practitioner's lens. It's pretty simple and powerful - I've used it repeatedly as a guiding principle when doing anything from UI to systems design. "
Clark's Bio
Clark Breyman is an principal engineer with Yammer's infrastructure team where he applies a combination of code, OCD, and war stories to make life easier for our product engineers. When not fighting entropy, he enjoys San Francisco, Lindy Hop, yoga and natural language processing.
Main Talk
Devon O'Dell on "Nonblocking Algorithms and Scalable Multicore Pro…
PWL#14 => Jordan West on Logical Time
Mini
Nathan Taylor on "Your computer is already a distributed system. Why isn’t your OS?" (http://www.barrelfish.org/barrelfish_hotos09.pdf).
From Nathan: "This is one in a long line of papers advocating for a loosely-coupled, service-oriented operating system architecture, an argument that extends back to the dawn of systems research. Alternative OSs like microkernels have long been considered more stable and easier to reason about by the systems community, but the performance overhead that comes with running them means typically our OSs still resemble the ones from the '60s. The authors have spun the "it will be too slow" argument on its head by extrapolating hardware trends and adopting terminology from the distributed systems community, and end up making a compelling case for their design not only being the _only_ path forward,…
PWL#13=> Armon Dadgar on Bloom Filters and HyperLogLog
PWL Mini
Matt Adereth on "The Mode Tree: A Tool for Visualization of Nonparametric Density Features"(http://adereth.github.io/oneoff/Mode%20Trees.pdf). From Matt: "We often look at summaries of univariate data using basic descriptive statistics like mean and standard deviation and visualizations like histograms and box plots. The Mode Tree is a powerful alternative visualization that reveals important details about our distributions that none of the standard approaches can show.
I particularly like this paper because it was really the by-product of some interesting algorithmic work in Computational Statistics. A lot of the techniques in this area are pretty math heavy and inaccessible, so I appreciated that they dedicated a paper to making a visualizat…
PWL#12 => Caitie McCaffrey on Orleans: A Framework for Cloud Computing
PWL Minis
We have two 5-7 minute talks before our main presentation where a short summary of a paper or an exciting idea is shared. Anyone can sign up for a mini, just email us!
• Mini #1: Veronica Ray on Experimenting At Scale With Google Chrome’s SSL Warning (http://www.adrienneporterfelt.com/chi-ssl-experiment.pdf). From Veronica: "Whether we are surfing the web or developing applications, browser security affects us all. This study from 2014 is the first to demonstrate why real users disregard browser security warnings. Its conclusions about the impact of warning design are surprising and inspire reflection about what we as developers can do to keep our users safe."
• Mini #2: Garet…
PWL#11 => Alex Rasmussen on Flat Datacenter Storage
Introducing PWL Mini!!
Starting this month we'll be opening up two 5-7 minute talk slots before our main talk. The idea is to share with the group a short summary of a paper or an idea that you are super excited about. Anyone can volunteer minis, just email us!
• Mini #1: Sargun Dhillon on VL2, a paper by Microsoft Research about computer networking. VL2 leverages several novel schemes in order to build full-bisection, highly-scalable, and decentralized datacenter networks in an economical fashion. These networks continue to support the layer 2 and flat addressing semantics. Many modern networks have been greatly influenced by this design.
Sargun Dhillon (@sargun) is highly interested in schemes for efficient, flexible computer networks to enable the next generatio…
PWL#10 => Peter Bailis on Managing Update Conflicts in Bayou
Peter Bailis presents the "Managing Update Conflicts in Bayou, A Weakly Connected Replicated Storage System" paper by Doug Terry, Marvin Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer, and Carl H. Hauser in SOSP 1995. http://db.cs.berkeley.edu/cs286/papers/bayou-sosp1995.pdf
Peter tells us: "A perennial challenge in managing shared, mutable state in distributed systems is the ability to permit concurrent writes while maintaining some degree of application "correctness." Bayou is an excellent and early example of an "optimistic" strategy for handling the concurrent update problem. Many of the techniques in Bayou---such as dependency tracking, application-specific merge procedures, and log shipping---are increasingly popular, can be found in systems like Dynamo, and …
PWL#9 => Leif Walsh on Level Ancestor Simplified
Leif Walsh , engineer at Tokutek (and from PWL NYC), comes to visit us! Leif will present the Level Ancestor Simplified paper by Bender and Farach-Colton.
Leif tells us: " My favorite problems are always those with the highest ratio of difficulty in solving to difficulty in stating. The lowest common ancestor problem exemplifies this. It was first stated in 1973, and can be described to anyone in two sentences, or with one sentence and a picture. But it took 11 years before an optimal solution was discovered, and another 16 before an understandable and implementable solution with the same bounds was presented, in this paper, The LCA Problem Revisited<…
PWL#8=> Kyle Kingsbury on The attraction between two perfectly conducting plates
Kyle Kingsbury presents the On the attraction between two perfectly conducting plates paper.
In Kyle's words: "It's a very short physics paper--a sidenote, really, to work he considered much more important. Sixty years later, though, we consider this sidenote his defining work--it gave rise to one of the weirdest physics phenomena ever, and has only gotten *more* confusing since the original proof. There's a really interesting history behind it that reveals some of the sociology of academia. Plus it's just fucking fascinating physics.
It also doesn't require any math beyond, say, high school calculus, but illustrates what a rigorous formal argument looks like--and I'm well-prepared to teach the math and concepts to an audience without any mathematical expertise. I think it…
PWL#7 => Armon Dadgar on SWIM
Armon Dadgar from HashiCorp presents the SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol paper by Abhinandan Das, Indranil Gupta, and Ashish Motivala.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/158
Armon's Bio
Armon has a passion for distributed systems and their application to real world problems. He is currently the CTO of HashiCorp, where he brings distributed systems into the world of DevOps tooling. He has worked on Terraform, Consul, and Serf at HashiCorp, and maintains the Statsite and Bloomd OSS projects as well.
<…PWL#6 => Peter Alvaro: Using Reasoning about Knowledge to Analyze Dist. Systems
Peter Alvaro from UC Berkeley will present the paper "Using Reasoning about Knowledge to Analyze Distributed Systems" by Joseph Halpern.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/147
Peter has kindly provided some references to help you get started
• Prior Halpern work on knowledge in DS:
https://www.cs…
PWL#5=> Henry Robinson on FLP: Imp of Distributed Consensus w One Faulty Process
Henry Robinson from Cloudera will present the paper "Impossibility of Distributed Consensus with One Faulty Process" by Fischer, Lynch and Patterson. This paper won the Dijkstra award given to the most influential papers in distributed computing so make sure you don't miss this!
Note that Henry will be focusing on the JACM version of the paper, not the PODS version. The JACM version is linked in the paper title above and you can also find it here.
If anyone really wants extra reading, you might consider the following:
PWL#4 => Joel VanderWerf on Calvin
Joel VanderWerf stops by to talk about Calvin. This time we have 3 relevant papers for this meetup!
• Calvin: Fast Distributed Transactions for Partitioned Database Systems, SIGMOD 2012 by Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
• Consistency Tradeoffs in Modern Distributed Database System Design, 2012 by Daniel J. Abadi.
• Modularity and Scalability in Calvin, IEEE 2013 by Alexander Thomson and Daniel J. Abadi.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Joel presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read …
PWL#3 => Bruce Spang On Bimodal Multicast
Bruce Spang from Fastly will cover the Bimodal Multicast paper by Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao,. Mihai Budiu, and Yaron Minsky.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza. We will do our best to stream the event live (we will announce the link on this thread the day of the event)
After Bruce presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/…
PWL#2 => Andy Gross on The Akamai Network
Andy Gross from Twitter will present The Akamai Network: A Platform for High-Performance Internet Applications paper by Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Andy presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/68
Andy's Bio:
Andy is a Staff Software Engineer at Twi…
PWL#1 => Ryan Kennedy on Dapper, a Distributed Systems Tracing Infrastructure
Ryan Kennedy from Yammer Engineering will be kicking off our group by presenting the Dapper, a Large-Scale Distributed Systems Tracing Infrastructure paper by Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan and Chandan Shanbhag.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Ryan presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/…