San Francisco Chapter
What was the last paper within the realm of computing you read? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research publication with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper. Otherwise, just come, listen, learn, and discuss.
We'll be using papers-we-love's curated repository. Please contribute by adding PR's for papers, code, and/or links to other repositories.
PWL SF strictly adheres to the Code of Conduct set forth by all PWL charters.
Location: Fastly - 475 Brannan Street #320, San Francisco, CA
Sign-up: Please RSVP for meetings via Meetup.com
Kavya Joshi on "Kraken: Leveraging Live Trafﬁc Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services" - https://research.fb.com/publications/kraken-leveraging-live-traf%EF%AC%81c-tests-to-identify-and-resolve-resource-utilization-bottlenecks-in-large-scale-web-services/
Kavya (https://twitter.com/kavya719) writes code for a living at a start-up in San Francisco. She particularly enjoys architecting and building highly concurrent, highly scalable systems. In her free time, she reads non-fiction and climbs rocks. Before moving to San Francisco to be an Adult, Kavya was at MIT where she got a Bachelor's and Master's in Computer Science.
Aaron Goldman on Hash Array Mapped Trie (http://lampwww.epfl.ch/papers/idealhashtrees.pdf)
Aaron David Goldman did his graduate work at the Georgia Institute of Technology where he studied Electrical and Computer Engineering. He has worked on radar systems for the US Department of Defense, Anti Abuse at Google, and is currently working in Runtime Application Self Protection at tCell. In his spare time he is building a new internet out of immutable data.
Dave Cheney (https://twitter.com/davecheney) on What Have We Learned from the PDP-11? (https://gordonbell.azurewebsites.net/CGB%20Files/What%20Have%20We%20Learned%20From%20the%20PDP-11%201977%20c…
Bryan Cantrill on "ARC: A Self-Tuning, Low Overhead Replacement Cache" by Nimrod Megiddo and Dharmendra Modha ( https://www.usenix.org/legacy/event/fast03/tech/full_papers/megiddo/megiddo.pdf )
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform. Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system software, from the guts of the kernel to client-code on the browser. In particular, he co-designed and implemented DTrace, a facility for dynamic instrumentation of production systems that won the Wall Street Journal's top Technology Innovation Award in 2006 and the USENIX Software Tools User Group…
J. Paul Reed on "Trade-offs Under Pressure: Heuristics and Observations of Teams Resolving Internet Service Outages" (http://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=8084520&fileOId=8084521 )
J. Paul Reed has over fifteen years experience in the trenches as a build/release engineer, working with such storied companies as VMware, Mozilla, Postbox, Symantec, and Salesforce.
In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's worked across a number of industries, from financial services to cloud-based infrastructure to health care, with teams ranging from…
André Arko on Robin Hood Hashing
André Arko leads the Bundler and RubyGems teams, co-authored The Ruby Way, and blogs at <a>http://arko.net.</a>; He works at Cloud City as a software development consultant, and founded Ruby Together, a non-profit that pays for work on Ruby open source
Tyler McMullen on Delta CRDTs
Tyler will do his best to summarize and get you hooked on the three papers listed below:
Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text …
***okta has kindly volunteered to host us. They do however ask that you sign an NDA at the door. If this is an issue for you, please let us know in advance.
Aish Raj Dahal on "Cuckoo Filter: Practically Better Than Bloom" (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Aish is a software engineer with a passion for distributed systems. He currently works for PagerDuty, building reliable event driven systems. In the past life, he was at Wall Street building software platforms for high performance trade execution. He was also a maintainer for KDE's KGet project.
Peter Geoghegan on "Query Evaluation Techniques for Large Databases"
Lukasz Jagiello on Fast Inverse Square Root - http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between NO and NO he focus his work at modern approach to monitoring and distributed storage
</a><a href="https://twitter.com/el_bhs">Ben Sigelman is the cofounder and CEO of
Kiran Bhattaram on HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf)
Kiran Bhattaram loves making things, whether tinkering with circuits, writing software systems, or sewing dresses. She works on Stripe’s infrastructure team, and has previously built things for the New York Times, LinkedIn and MIT CSAIL.
Yifan Wu on "Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization" (
Matt Adereth on the January 1965 issue of The Computer Journal.
This issue contains one of the most important techniques in numerical optimization, the Nelder-Mead simplex method. An entire full-length talk could be dedicated to it, but instead we’re going to try and understand the historical context by looking at everything else in the journal, from the other papers to the letters to the editor to the advertisements.
Matt builds tools and infrastructure for quantitative research at Two Sigma. He previously worked at Microsoft on Visio, focusing on ways to connect data to shapes In his spare time, he builds ergonomic keyboards using Clojure.
Tom Santero on DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker - https…
**Please note that this month's location is not optimal for ADA. Please let us know your needs and we will do our best to accommodate you.
Daniel "Spoons" Spoonhower on "A Unified Theory of Garbage Collection"
Let's take a deep dive into language implementation: garbage collectors are a great tool for improving programmer productivity… until something goes wrong and you find yourself endlessly tuning collector configuration parameters. This paper takes a principled approach to understanding different garbage collection algorithms and offers something useful for both language implementors and programmers that use garbage-collected languages.
Spoons is a co-f…
Kevin Burke on "Feral Concurrency" http://www.bailis.org/papers/feral-sigmod2015.pdf
Kevin Burke (https://kev.inburke.com) likes building great experiences. He helped scale Twilio and Shyp, and currently runs a software consultancy. Kevin once accidentally left Waiting for Godot at the intermission.
Caitie McCaffrey on "Distributed Programming in Argus" by Barbara Liskov
Yifan Wu on "Real Time Groupware as a Distributed System: Concurrency Control and its Effect on the Interface"by Saul Greenberg David Marwood" ( https://pdfs.semanticscholar.org/cf3c/135df03e455be1e8a64e5af6f19d2ff3ee2f.pdf )
This 90’s paper pioneers the idea that a front-end interface is a distributed system. As the UI becomes more real time and collaborative, we are starting to see a lot of anomalies in our day to day application experiences — this paper will shed light on a more structured way to reason about concurrency on the front-end.
Yifan is a graduate student at UC Berkeley researching topics at the intersection of databases and human computer interaction, currently i…
Eitan Adler on Program development by stepwise refinement ( https://www.inf.ethz.ch/personal/wirth/Articles/StepwiseRefinement.pdf )
Eitan Adler is software engineer with a passion for distributed systems, security, and open source software. Currently employed by Twitter he spends his days improving developer tooling to make the lives of other software engineers easier.
Diego Ongaro on "On the criteria to be used in decomposing systems into modules"
Gareth Morgan on Physically-Based Shading at Disney (https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf)
In recent years the adoption of physically based shading has been responsible for big advancements in the quality of 3D graphics (both real time and production graphics). This paper is an overview of how Disney Animation Studios pioneered physically based shading. It is a great overview of the field of PBS and BRDF theory generally.
Gareth Morgan has been involved in games and 3D graphics since 1999, starting at Silicon Graphics followed by several games companies including Activision and BAM Studios…
Amanda Gilmore on How Complex Systems Fail (http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf) by Richard Cook
Amanda Gilmore is a site reliability engineer at Heroku, and has worked with complex distributed systems for most of her career. She currently specializes in database reliability and previously worked as a QA engineer, exposing her to a myriad of interesting ways that technical systems can fail.
Tony Arcieri on A Protocol for Interledger Payments:
Tom Faulhaber on "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" (https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf) which is a foundational paper in the use of distributed hash tables.
Tom Faulhaber is principal of Infolace (www.infolace.com), a San Francisco-based consultancy that helps clients from startups to global brands turn raw data into information and information into action. Throughout his career, Tom has developed systems for high-performance TCP/IP, large-scale scientific visualization, energy trading, and many more.
In addition, Tom is a contributor to the Cl…
Adrian Cockcroft on Communicating Sequential Processes (http://spinroot.com/courses/summer/Papers/hoare_1978.pdf)
Paul Borrill on Lamport’s unfinished revolution.
This talk reviews Lamport’s seminal 1978 paper on Time, Clocks and the Ordering of Events, the 2nd most cited paper in all of computer science.
Almost all software engineers claim to have read it. Many who haven’t read it, use (and basically understand) the fundamental idea of logical clocks, and their progeny (vector clocks, matrix clocks, etc.). More than a few understand the current state of the art: dotted version vectors and bounded version vectors. Paradoxically, almost everyone …
Lukasz Jagiello on “pASSWORD tYPOS and How to Correct Them Securely” (https://www.cs.cornell.edu/~rahul/papers/pwtypos.pdf)
Lukasz tells us: “typo-tolerant password authentication for arbitrary user-selected passwords” sounds like a really bad security joke but if we combine that with metrics where almost 10% of failed login attempts fail due to a handful of simple, easily correctable typos, such as capitalization errors. Authors proves it is possible to improve user experience with really low impact on security.
I really enjoy this paper because it’s not a standard security approach and in many places it’s a reasonable tradeoff between security and UX.
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between…
Tyler McMullen on Similarity Estimation Techniques from Rounding Algorithms
Tyler's Bio: Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, he has experience in everything from web design to kernel development, and loathes all of it. Especially distributed systems.
Joel VanderWerf on The Krohn-Rhodes Theorem and Distributed Computing ( http://www.ams.org/journals/tran/1965-116-00/S0002-9947-1965-0188316-1/S0002-9947-1965-0188316-1.pdf )
The Krohn-Rhodes Theorem of 1962 surprised the math world by building arbitrary finite semigroups, and hence arbitrary finite state machines, out of flip-flops (registers with reset operations) and groups (permutations). The construction uses the wreath product, a coordinate system with unidirectional data flow among the factors, like addition with carry. How many group factors are needed? We don't know if this complexity question is decidable, after a half century of work. However, t…
Gilbert Bernstein on Marching Cubes (http://www.eecs.berkeley.edu/~jrs/meshpapers/LorensenCline.pdf )
Marching Cubes is one of the most important geometry algorithms for 3D volume visualization, 3D scanning/reconstruction, etc. It has the distinction of being the most cited graphics paper ever. And it's also definitely not the best algorithm you could implement for the problem it solves. Intriguing?
Gilbert Bernstein is a Ph.D. student in the department of Computer Science at Stanford University. His work focuses on a range of topics across Computer Graphics, HCI and Programming Languages, including Domain-Specific (Programming) Languages, Visual Tool…
Marios Assiotis on "Throttling Utilities in the IBM DB2 Universal Database Server" (http://www.nt.ntnu.no/users/skoge/prost/proceedings/acc04/Papers/0354_ThA01.3.pdf)
Marios is the CTO at TubiTV, the world's largest free streaming TV & movie library. His interests include simplifying complex systems, storage and low latency network i/o at scale. A transplant from Cyprus, he spends his free time trying to create the perfect all-American burger.
Matt Adereth from Two Sigma will present "A Scalable Bootstrap for Massive Data" (
Bryan Fink on "Fluctuations of Hi-Hat Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording" (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902)
Bryan hacks distributed systems by day, and does almost anything else by night. His interests in percussion and computers began nearly coincidentally over twenty years ago in a small town on the Great Plains. The combination has led to him having strange thoughts about time and coordination.
****** We are closing the year with a PWL marathon ******
Talk #1 - Nathan Taylor on " Corey: An Operating system for Many Cores" (https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf) and "An analysis of Linux scalability to many cores" (https://pdos.csail.mit.edu/papers/linux:osdi10.pdf)
This is a story that spans two low-level systems papers. While on the surface it's all about how to make operating systems scale, it's also a story about how the same researchers …
Tony Arcieri on Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud - http://theory.stanford.edu/~ataly/Papers/macaroons.pdf
Gareth Morgan on The Rendering Equation (
Jeff Carpenter presents Design Principles Behind Smalltalk by Daniel H. H. Ingalls. Jeff tells us: This a paper I love because it frames programming language design as a means to "provide computer support for the creative spirit in everyone." The paper describes the design principles the Learning Research Group at PARC discovered as they evolved the design of the Smalltalk language.
Stephen Tu presents: "Random features for large-scale kernel machines" (http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf) by Ali Rahimi and Ben Recht
Kernel methods in machine learning are a popular tool used to express richer function classes. These methods work via access to a kernel function, which can be thought of as a measure of similarity between two vectors. The standard method of optimization via kernels requires solving a program with decision variable the size of the number of data points. As dataset sizes grow, this becomes prohibitive.
This paper addresses this issue in a surprisingly practical way. Specifically, the authors show that by using a multiple of d random projections based on the Fouri…
David Brockman on "What uncovering a massive academic fraud taught me about how academia needs to change". A bit of background
Here's what David will be presenting: https://drive.google.com/file/d/0B_Qj0otlErJqVlJtMUhTU3ZiRzQ
David Broockman is an Assistant Professor of Political Economy at the Stanford G…
Kelsey Gilmore-Innis on "Information Escrows by Ian Ayres & Cait Unkovich" (http://repository.law.umich.edu/cgi/viewcontent.cgi?article=1091&context=mlr)
Kelsey is the Director of Technology at Sexual Health Innovations, where they are currently building Callisto (www.projectcallisto.org) based on Ayers & Unkovich's paper. She'll be talking about the joys and pitfalls of going from academic paper to production code which should ring familiar whether your source is Microsoft Research or the Michigan Law Review.
Ben Sigelman will present…
Kyle Isom on "Out of the Tarpit" by Ben Moseley and Peter Marks (http://shaffner.us/cs/papers/tarpit.pdf)
About the paper: Software inevitably grows complex as it grows in scope. Out of the Tar Pit puts forward some useful, actionable ideas for how we can manage complexity in our programs by looking at how we can reduce state in our systems.
Kyle is a systems engineer in the Bay area
Sargun will present the "Facebook Haystack" by Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel (http://static.usenix.org/legacy/events/osdi10/tech/full_papers/Beaver.pdf).
Sargun tells us: "It presents a …
From Clark:"Through a CS practitioner's lens. It's pretty simple and powerful - I've used it repeatedly as a guiding principle when doing anything from UI to systems design. "
Clark Breyman is an principal engineer with Yammer's infrastructure team where he applies a combination of code, OCD, and war stories to make life easier for our product engineers. When not fighting entropy, he enjoys San Francisco, Lindy Hop, yoga and natural language processing.
Devon O'Dell on "Nonblocking Algorithms and Scalable Multi…
From Nathan: "This is one in a long line of papers advocating for a loosely-coupled, service-oriented operating system architecture, an argument that extends back to the dawn of systems research. Alternative OSs like microkernels have long been considered more stable and easier to reason about by the systems community, but the performance overhead that comes with running them means typically our OSs still resemble the ones from the '60s. The authors have spun the "it will be too slow" argument on its head by extrapolating hardware trends and adopting terminology from the distributed systems community, and end up making a compelling case for their design not only being the _only_ path forward,…
Matt Adereth on "The Mode Tree: A Tool for Visualization of Nonparametric Density Features"(http://adereth.github.io/oneoff/Mode%20Trees.pdf). From Matt: "We often look at summaries of univariate data using basic descriptive statistics like mean and standard deviation and visualizations like histograms and box plots. The Mode Tree is a powerful alternative visualization that reveals important details about our distributions that none of the standard approaches can show.
I particularly like this paper because it was really the by-product of some interesting algorithmic work in Computational Statistics. A lot of the techniques in this area are pretty math heavy and inaccessible, so I appreciated that they dedicated a paper to making a visua…
We have two 5-7 minute talks before our main presentation where a short summary of a paper or an exciting idea is shared. Anyone can sign up for a mini, just email us!
• Mini #1: Veronica Ray on Experimenting At Scale With Google Chrome’s SSL Warning (http://www.adrienneporterfelt.com/chi-ssl-experiment.pdf). From Veronica: "Whether we are surfing the web or developing applications, browser security affects us all. This study from 2014 is the first to demonstrate why real users disregard browser security warnings. Its conclusions about the impact of warning design are surprising and inspire reflection about what we as developers can do to keep our users safe."
• Mini #2:
Introducing PWL Mini!!
Starting this month we'll be opening up two 5-7 minute talk slots before our main talk. The idea is to share with the group a short summary of a paper or an idea that you are super excited about. Anyone can volunteer minis, just email us!
• Mini #1: Sargun Dhillon on VL2, a paper by Microsoft Research about computer networking. VL2 leverages several novel schemes in order to build full-bisection, highly-scalable, and decentralized datacenter networks in an economical fashion. These networks continue to support the layer 2 and flat addressing semantics. Many modern networks have been greatly influenced by this design.
Sargun Dhillon (@sargun) is highly interested in schemes for efficient, flexible computer networks to enable the next generatio…
Peter Bailis presents the "Managing Update Conflicts in Bayou, A Weakly Connected Replicated Storage System" paper by Doug Terry, Marvin Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer, and Carl H. Hauser in SOSP 1995. http://db.cs.berkeley.edu/cs286/papers/bayou-sosp1995.pdf
Peter tells us: "A perennial challenge in managing shared, mutable state in distributed systems is the ability to permit concurrent writes while maintaining some degree of application "correctness." Bayou is an excellent and early example of an "optimistic" strategy for handling the concurrent update problem. Many of the techniques in Bayou---such as dependency tracking, application-specific merge procedures, and log shipping---are increasingly popular, can be found in systems like Dynamo, an…
Leif tells us: " My favorite problems are always those with the highest ratio of difficulty in solving to difficulty in stating. The lowest common ancestor problem exemplifies this. It was first stated in 1973, and can be described to anyone in two sentences, or with one sentence and a picture. But it took 11 years before an optimal solution was discovered, and another 16 before an understandable and implementable solution with the same bounds was presented, in this paper, The LCA Problem Revisited<…
Kyle Kingsbury presents the On the attraction between two perfectly conducting plates paper.
In Kyle's words: "It's a very short physics paper--a sidenote, really, to work he considered much more important. Sixty years later, though, we consider this sidenote his defining work--it gave rise to one of the weirdest physics phenomena ever, and has only gotten *more* confusing since the original proof. There's a really interesting history behind it that reveals some of the sociology of academia. Plus it's just fucking fascinating physics.
It also doesn't require any math beyond, say, high school calculus, but illustrates what a rigorous formal argument looks like--and I'm well-prepared to teach the math and concepts to an audience without any mathematical expertise. I think it…
Armon Dadgar from HashiCorp presents the SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol paper by Abhinandan Das, Indranil Gupta, and Ashish Motivala.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/158
Armon has a passion for distributed systems and their application to real world problems. He is currently the CTO of HashiCorp, where he brings distributed systems into the world of DevOps tooling. He has worked on Terraform, Consul, and Serf at HashiCorp, and maintains the Statsite and Bloomd OSS projects as well.
Peter Alvaro from UC Berkeley will present the paper "Using Reasoning about Knowledge to Analyze Distributed Systems" by Joseph Halpern.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/147
Peter has kindly provided some references to help you get started
• Prior Halpern work on knowledge in DS:
Henry Robinson from Cloudera will present the paper "Impossibility of Distributed Consensus with One Faulty Process" by Fischer, Lynch and Patterson. This paper won the Dijkstra award given to the most influential papers in distributed computing so make sure you don't miss this!
Note that Henry will be focusing on the JACM version of the paper, not the PODS version. The JACM version is linked in the paper title above and you can also find it here.
If anyone really wants extra reading, you might consider the following:
Joel VanderWerf stops by to talk about Calvin. This time we have 3 relevant papers for this meetup!
• Calvin: Fast Distributed Transactions for Partitioned Database Systems, SIGMOD 2012 by Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
• Consistency Tradeoffs in Modern Distributed Database System Design, 2012 by Daniel J. Abadi.
• Modularity and Scalability in Calvin, IEEE 2013 by Alexander Thomson and Daniel J. Abadi.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Joel presents the paper, we will open up the floor to discussion and questions.
We hope that you'll …
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza. We will do our best to stream the event live (we will announce the link on this thread the day of the event)
After Bruce presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/…
Andy Gross from Twitter will present The Akamai Network: A Platform for High-Performance Internet Applications paper by Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Andy presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/68
Andy is a Staff Software Engineer at …
Ryan Kennedy from Yammer Engineering will be kicking off our group by presenting the Dapper, a Large-Scale Distributed Systems Tracing Infrastructure paper by Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan and Chandan Shanbhag.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Ryan presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/…