San Francisco Chapter
What was the last paper within the realm of computing you read? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research publication with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper. Otherwise, just come, listen, learn, and discuss.
We'll be using papers-we-love's curated repository. Please contribute by adding PR's for papers, code, and/or links to other repositories.
PWL SF strictly adheres to the Code of Conduct set forth by all PWL charters.
Location: Fastly - 475 Brannan Street #320, San Francisco, CA
Sign-up: Please RSVP for meetings via Meetup.com
Benjamin Goering on "ActivityPub W3C Recommendation"
Ben has been studying and evangelizing decentralized social networking for 15 years since the early days of blogging and podcasting. He worked on social-media-as-a-service startup Livefyre from[masked] with customers from cnn.com to ign.com to foxnews.com, and contributed as an Invited Expert to the W3C Social Web WG that helped standardize activitystreams 2.0 and activitypub as technical recommendations. Now he works at Protocol Labs on DIDs and web3.storage.
**The time has come to select the next Papers We Love SF program committee!**
**As a program committee member you get to have a say in what papers we present and learn how to run the organization.**
**This meeting will open with a mini-talk from Aaron Goldman on Liquid Data Networking.**
**After that we will take nominations for and vote on the program committee.**
**Come join us and express your opinion on the future direction of Papers We Love!**
**Anyone present can vote for the program committee.**
**Liquid Data Networking**
**[ICN '20: Proceedings of the 7th ACM Conference on Information-Centric Networking](https://dl.acm.org/doi/proceedings/10.1145/3405656)**
**September 2020 Pages 129–135**
Yao Yue on
The Tail at Scale
Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.
By Jeffrey Dean and Luiz André Barroso
Bio: Yao Yue is an engineer and manager working at Twitter Platform. She has been working on distributed cache since 2010, with extensive experience with popular OSS projects such as Memcached and Redis. She designed and implemented a modular open-sourced cache framework called Pelikan (more at pelikan.io). Since 2017, she has started and managed the Infrastructure Performance and Optimization team. Her team work on infrastructure performance and capacity monitoring, optimizing systems configurations and utilization, cross-service tracing an insight at scale, and advanc…
Optimal Speedup of Las Vegas Algorithms
Michael Luby Alistair Sinclair David Zuckerman
Natalie Telis on
College Admissions and the Stability of Marriage
D. Gale and L. S. Shapley
Bio: Natalie Telis is a mathematician by training and a biologist by trade. She did her PhD work at Stanford, developing methods to study the connection between human diversity, human history and disease risk. In her spare time, she applie…
Presenter: Aaron D Goldman
Aaron is a security engineer at Twitter with a history of protecting distributed systems and microservices. He has previously worked on radar systems for the US Department of Defense, Anti Abuse at Google, and Runtime Application Self Protection at tCell. In his spare time, he is building a new internet out of immutable data.
"Hashed and Hierarchical Timing Wheels: Efficient
Data Structures for Implementing a Timer Facility"
Zoom Meeting details:
Nikilesh Subramoniapillai Ajeetha is inviting you to a scheduled Zoom meeting.
Topic: Aaron D Goldman on "Hashed and Hierarchical Timing Wheels"
Time: Jan 21,[masked]:00 PM Pacific Time (US and Canada)
Join Zoom Meeting
Presenter: Zephyr Pellerin
Zephyr Pellerin is a security engineer at the threat intelligence firm Polyswarm. He has implemented and evaluated a variety of methods of calibrating and aggregating uncertain expert verdicts on malware, such as the one described in this paper.
"Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels"
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors’ detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTota…
Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport, Massachusetts Computer Associates, Inc.
The concept of one event happening before another
in a distributed system is examined, and is shown to
define a partial ordering of the events. A distributed
algorithm is given for synchronizing a system of logical
clocks which can be used to totally order the events.
The use of the total ordering is illustrated with a
method for solving synchronization problems. The
algorithm is then specialized for synchronizing physical
clocks, and a bound is derived on how far out of
synchrony the clocks can become.
David is a big fan of papers, cats, and the oxford comma. He works as an architect at Salesforce from his home in Seattle, wi…
Aaron D Goldman
Aaron is a security engineer at Twitter with a history protecting distributed systems and microservices. He has previously worked on radar systems for the US Department of Defense, Anti Abuse at Google, and Runtime Application Self Protection at tCell. In his spare time he is building a new internet out of immutable data.
Spritz—a spongy RC4-like stream cipher and hash function
This note reconsiders the design of the stream cipher RC4, and proposes an improved variant, which we call “Spritz” (since the output comes in fine drops rather than big blocks.)
Zoom Meeting Details:
Topic: Aaron D Goldman on "Spritz—a spongy RC4-like stream cipher and hash function"
Time: Oct 22,[masked]:30 PM Pacif…
Johnathan Chiu on TBD
Johnathan is an undergraduate at UC Berkeley studying Electrical Engineering and Computer Science (EECS). His interests are in robotics, data compression, and immersive technology. He is currently doing research on Neural Network performance at Berkeley.
Bruce Spang on "The Detection of Defective Members of Large Populations"
[Dorfman 43] Dorfman, Robert. The Detection of Defective Members of Large Populations. Ann. Math. Statist. 14 (1943), no. 4, 436--440. doi:[masked]/aoms/[masked]. > https://projecteuclid.org/euclid.aoms/1177731363
Some relevant papers
* Robert Dorfman. “The Detection of Defective Members of Large Populations” (https://projecteuclid.org/euclid.aoms/1177731363)
* William Kautz, Richard Singleto…
Serverless Computing: One step forward and two steps backward
Serverless computing offers the potential to program the cloud in an autoscaling, pay-as-you go manner. In this paper we address critical gaps in first-generation serverless computing, which place its autoscaling potential at odds with dominant trends in modern computing: notably data-centric and distributed computing, but also open source and custom hardware. Put together, these gaps make current serverless offerings a bad fit for cloud innovation and particularly bad for data systems innovation. In addition to pinpointing some of the main shortfalls of current serverless architectures, we raise a set of challenges we believe must be met to unlock the radical potential that the cloud—with i…
Andre Arko on Why Are The Prices So Damn High.
Andre tells us: "This is a paper that examines the very high and continuously climbing costs of education and health care in the United States. This paper is especially interesting as a follow-up to last month’s PWL Mini on Baumol’s cost disease"
André Arko is a software consultant at cloudcity.io, and founder of the software non-profit rubytogether.org. He graduated from college before he could drink, has written open source installed hundreds of millions of times, and lives in a shipping container that overlooks SFO. And at least two of those things are true.
Jana Iyengar on "The death of an end-to-end internet (and a way forward)"
Over the past two decades, the Internet’s runaway s…
Kevin Burke on Baumol’s Cost Disease - On The Performing Arts: The Anatomy of their Economic Problems - http://people.stern.nyu.edu/wbaumol/OnThePerformingArtsTheAnatomyOfTheirEcoProbs.pdf
Kevin Burke (https://twitter.com/derivativeburke) ( https://burke.services ) likes building great experiences. He helped scale Twilio and Shyp, and currently runs a software consultancy. Kevin once accidentally left Waiting for Godot at the intermission.
Jessie Frazelle on A Tale of Two Papers
"It was the best of times, it was the worst of times." Come dive into two papers covering datacenter outages: "Maelstrom: Mitigating Datacenter-level Disasters by Draining Interdependent Traffic Safely and Efficie…
Ramon Nogueira on BlinkDB: Queries with Bounded Errors and
Bounded Response Times on Very Large Data - https://sameeragarwal.github.io/blinkdb_eurosys13.pdf
Ramon is a software engineer with a passion for making large systems easier to understand and operate. He currently works at Google on OpenCensus: an open source metrics and distributed tracing library for microservices. Previous hits include iCloud storage APIs at Apple, and startups in London and Johannesburg
Zach Tellman on Shape Decomposition for Multi-channel Distance Fields - https://dspace.cvut.cz/bitstream/handle/10467/62770/F8-DP-2015-Chlumsky-Viktor-thesis.pdf
Zach consults on the design of distributed systems and APIs. He has written "E…
Michael Kehoe on Democratically Finding The Cause of Packet Drops - https://arxiv.org/pdf/1802.07222.pdf
Michael Kehoe is a Staff SRE at LinkedIn who works on building scalable monitoring infrastructure, reliability principles and incident management. Michael previously interned at NASA Ames on their PhoneSat project. Michael's key interests lie in network engineering and automation.
Gwen Schapira on Peeking Behind the Curtains of
Serverless Frameworks - https://www.usenix.org/system/files/conference/atc18/atc18-wang-liang.pdf
Gwen Shapira is a principal data architect at Confluent,
previously an engineer at Cloudera. She's committer of the Apache
Kafka project and author of "Kafka - the definitive guide".
Max Seiden on Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
In under 15 pages, this paper describes the first principles of OLAP with such clarity and foresight that, over 20 years later, it's direct contributions are still highly relevant to anyone building products and technologies in the analytics space.
Max is a systems engineer with a deep interest in making information accessible to anyone with a question. He's currently at Sigma Computing, where he spends most of his time hacking on query compilation, optimization, and code generation. Before Sigma, he was at Platfora (now Workday) working on data cubes and materialized views using Spark, MapReduce, and the bundle-of-joy that is the Hadoop Ecosystem. Max received a bachelors in computer science from the University of Michigan. When…
In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, I'll show that the ideas are far more widely applicable, particularly in scaling stateful applications. In particular, how we implemented some of the ideas from the paper in our distributed stream processor Wallaroo.
Sean T. Allen is VP of Engineering at Wallaroo Labs and a member of the Pony core team. His turn-ons include programming languages, distributed computing, Hiwatt amplifiers, and Fender Telecasters. His turn-offs include mayonnaise, stirring yogurt, and sloppy code. He is one of the authors of Storm Applied.
Ori Berenstein on The Slab Allocator https://www.usenix.org/legacy/publications/library/proceedings/bos94/full_papers/bonwick.a
Ori was once evicted from the womb. The experience was unpleasant, and he has never forgiven the world for it. Today, he spends most of his time waving fingers over keyboards in order to inject magic into microwaves. Follow Ori at https://twitter.com/oribernstein
Scott Vokes on "An Efficient Context-Free Parsing Algorithm" by Jay Earley (Communications of the ACM, 1970)
Scott tells us: This paper introduces the Earley algorithm, which is a fully…
Omoju Miller on "All The Cool Kids, How Do They Fit In? Popularity and Demographic Biases in RecommenderEvaluation and Effectiveness" (http://proceedings.mlr.press/v81/ekstrand18b/ekstrand18b.pdf)
Omoju is a Senior Machine Learning Data Scientist with Github. She has over a decade of experience in computational intelligence. In the past, she has co-led the non-profit investment in Computer Science Education for Google and served as a volunteer advisor to the Obama administration’s White House Presidential Innovation Fellows.
David Calavera on "The QUIC Transport Protocol: Design and Internet-Scale Deployment": https://research.google.com/pubs/pub46403.html
David Calavera is the CTO of Netlify, where he and his team are building th…
Alan Karp on Comparing Information Without Leaking It. https://www.stat.berkeley.edu/users/aldous/157/Papers/fagin.pdf
Alan got a Ph.D. in Astronomy and was an assistant professor of physics at Dartmouth until he figured out his job was that of a small businessman whose money came from writing grant proposals. After that, he did 15 to life at IBM doing large scale scientific and parallel computing. He then joined HP Labs, where he worked on a variety of projects including being one of the architects of the HP/Intel Itanium processor; E-speak, a platform that was called "web services before there were web services;" Polaris, a virus safe computing environment for Windows; and several other demonstrations that systems can be made more functional and more usable by adding security. After 20+ years, HP Labs came to its senses and kicked him out, but he bamboozle…
Peter Bourgon on CASPaxos: Replicated State Machines without logs
Peter Bourgon is a distributed systems engineer who has seen things. He's the author of Go kit, a toolkit for microservices; and OK Log, a distributed logging system. He's currently driving the engineering observability initiative within Fastly.
Kolton Andrus On Designing and Deploying Internet Scale services https://www.usenix.org/legacy/event/lisa07/tech/full_papers/hamilton/hamilton.pdf
Kolton (https://twitter.com/KoltonAndrus) is co-founder and CEO of Gremlin. Previously he was a Chaos Engineer at Netflix improving streaming reliability and operatin…
Kavya Joshi on "Kraken: Leveraging Live Trafﬁc Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services" - https://research.fb.com/publications/kraken-leveraging-live-traf%EF%AC%81c-tests-to-identify-and-resolve-resource-utilization-bottlenecks-in-large-scale-web-services/
Kavya (https://twitter.com/kavya719) writes code for a living at a start-up in San Francisco. She particularly enjoys architecting and building highly concurrent, highly scalable systems. In her free time, she reads non-fiction and climbs rocks. Before moving to San Francisco to be an Adult, Kavya was at MIT where she got a Bachelor's and Master's in Computer Science.
Aaron Goldman on Hash Array Mapped Trie (http://lampwww.epfl.ch/papers/idealhashtrees.pdf)
Aaron David Goldman did his graduate work at the Georgia Institute of Technology where he studied Electrical and Computer Engineering. He has worked on radar systems for the US Department of Defense, Anti Abuse at Google, and is currently working in Runtime Application Self Protection at tCell. In his spare time he is building a new internet out of immutable data.
Dave Cheney (https://twitter.com/davecheney) on What Have We Learned from the PDP-11? (https://gordonbell.azurewebsites.net/CGB%20Files/What%20Have%20We%20Learned%20From%20the%20PDP-11%201977%20c…
Bryan Cantrill on "ARC: A Self-Tuning, Low Overhead Replacement Cache" by Nimrod Megiddo and Dharmendra Modha ( https://www.usenix.org/legacy/event/fast03/tech/full_papers/megiddo/megiddo.pdf )
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform. Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system software, from the guts of the kernel to client-code on the browser. In particular, he co-designed and implemented DTrace, a facility for dynamic instrumentation of production systems that won the Wall Street Journal's top Technology Innovation Award in 2006 and the USENIX Software Tools User Group Award i…
J. Paul Reed on "Trade-offs Under Pressure: Heuristics and Observations of Teams Resolving Internet Service Outages" (http://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=8084520&fileOId=8084521 )
J. Paul Reed has over fifteen years experience in the trenches as a build/release engineer, working with such storied companies as VMware, Mozilla, Postbox, Symantec, and Salesforce.
In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's worked across a number of industries, from financial services to cloud-based infrastructure to health care, with teams ranging from 2 to 2,…
André Arko on Robin Hood Hashing
André Arko leads the Bundler and RubyGems teams, co-authored The Ruby Way, and blogs at <a>http://arko.net.</a>; He works at Cloud City as a software development consultant, and founded Ruby Together, a non-profit that pays for work on Ruby open source
Tyler McMullen on Delta CRDTs
Tyler will do his best to summarize and get you hooked on the three papers listed below:
Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text analysis and recomm…
***okta has kindly volunteered to host us. They do however ask that you sign an NDA at the door. If this is an issue for you, please let us know in advance.
Aish Raj Dahal on "Cuckoo Filter: Practically Better Than Bloom" (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Aish is a software engineer with a passion for distributed systems. He currently works for PagerDuty, building reliable event driven systems. In the past life, he was at Wall Street building software platforms for high performance trade execution. He was also a maintainer for KDE's KGet project.
Peter Geoghegan on "Query Evaluation Techniques for Large Databases"
Peter tells us: Thi…
Lukasz Jagiello on Fast Inverse Square Root - http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between NO and NO he focus his work at modern approach to monitoring and distributed storage
</a><a href="https://twitter.com/el_bhs">Ben Sigelman is the cofounder and CEO of
Kiran Bhattaram on HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf)
Kiran Bhattaram loves making things, whether tinkering with circuits, writing software systems, or sewing dresses. She works on Stripe’s infrastructure team, and has previously built things for the New York Times, LinkedIn and MIT CSAIL.
Yifan Wu on "Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization" (
Matt Adereth on the January 1965 issue of The Computer Journal.
This issue contains one of the most important techniques in numerical optimization, the Nelder-Mead simplex method. An entire full-length talk could be dedicated to it, but instead we’re going to try and understand the historical context by looking at everything else in the journal, from the other papers to the letters to the editor to the advertisements.
Matt builds tools and infrastructure for quantitative research at Two Sigma. He previously worked at Microsoft on Visio, focusing on ways to connect data to shapes In his spare time, he builds ergonomic keyboards using Clojure.
Tom Santero on DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker - https://arxiv.org/abs/…
**Please note that this month's location is not optimal for ADA. Please let us know your needs and we will do our best to accommodate you.
Daniel "Spoons" Spoonhower on "A Unified Theory of Garbage Collection"
Let's take a deep dive into language implementation: garbage collectors are a great tool for improving programmer productivity… until something goes wrong and you find yourself endlessly tuning collector configuration parameters. This paper takes a principled approach to understanding different garbage collection algorithms and offers something useful for both language implementors and programmers that use garbage-collected languages.
Spoons is a co-founde…
Kevin Burke on "Feral Concurrency" http://www.bailis.org/papers/feral-sigmod2015.pdf
Kevin Burke (https://kev.inburke.com) likes building great experiences. He helped scale Twilio and Shyp, and currently runs a software consultancy. Kevin once accidentally left Waiting for Godot at the intermission.
Caitie McCaffrey on "Distributed Programming in Argus" by Barbara Liskov
Yifan Wu on "Real Time Groupware as a Distributed System: Concurrency Control and its Effect on the Interface"by Saul Greenberg David Marwood" ( https://pdfs.semanticscholar.org/cf3c/135df03e455be1e8a64e5af6f19d2ff3ee2f.pdf )
This 90’s paper pioneers the idea that a front-end interface is a distributed system. As the UI becomes more real time and collaborative, we are starting to see a lot of anomalies in our day to day application experiences — this paper will shed light on a more structured way to reason about concurrency on the front-end.
Yifan is a graduate student at UC Berkeley researching topics at the intersection of databases and human computer interaction, currently inve…
Eitan Adler on Program development by stepwise refinement ( https://www.inf.ethz.ch/personal/wirth/Articles/StepwiseRefinement.pdf )
Eitan Adler is software engineer with a passion for distributed systems, security, and open source software. Currently employed by Twitter he spends his days improving developer tooling to make the lives of other software engineers easier.
Diego Ongaro on "On the criteria to be used in decomposing systems into modules"
Gareth Morgan on Physically-Based Shading at Disney (https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf)
In recent years the adoption of physically based shading has been responsible for big advancements in the quality of 3D graphics (both real time and production graphics). This paper is an overview of how Disney Animation Studios pioneered physically based shading. It is a great overview of the field of PBS and BRDF theory generally.
Gareth Morgan has been involved in games and 3D graphics since 1999, starting at Silicon Graphics followed by several games companies including Activision and BAM Studios. For man…
Amanda Gilmore on How Complex Systems Fail (http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf) by Richard Cook
Amanda Gilmore is a site reliability engineer at Heroku, and has worked with complex distributed systems for most of her career. She currently specializes in database reliability and previously worked as a QA engineer, exposing her to a myriad of interesting ways that technical systems can fail.
Tony Arcieri on A Protocol for Interledger Payments:
Tom Faulhaber on "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" (https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf) which is a foundational paper in the use of distributed hash tables.
Tom Faulhaber is principal of Infolace (www.infolace.com), a San Francisco-based consultancy that helps clients from startups to global brands turn raw data into information and information into action. Throughout his career, Tom has developed systems for high-performance TCP/IP, large-scale scientific visualization, energy trading, and many more.
In addition, Tom is a contributor to the Cl…
Adrian Cockcroft on Communicating Sequential Processes (http://spinroot.com/courses/summer/Papers/hoare_1978.pdf)
Paul Borrill on Lamport’s unfinished revolution.
This talk reviews Lamport’s seminal 1978 paper on Time, Clocks and the Ordering of Events, the 2nd most cited paper in all of computer science.
Almost all software engineers claim to have read it. Many who haven’t read it, use (and basically understand) the fundamental idea of logical clocks, and their progeny (vector clocks, matrix clocks, etc.). More than a few understand the current state of the art: dotted version vectors and bounded version vectors. Paradoxically, almost everyone misse…
Lukasz Jagiello on “pASSWORD tYPOS and How to Correct Them Securely” (https://www.cs.cornell.edu/~rahul/papers/pwtypos.pdf)
Lukasz tells us: “typo-tolerant password authentication for arbitrary user-selected passwords” sounds like a really bad security joke but if we combine that with metrics where almost 10% of failed login attempts fail due to a handful of simple, easily correctable typos, such as capitalization errors. Authors proves it is possible to improve user experience with really low impact on security.
I really enjoy this paper because it’s not a standard security approach and in many places it’s a reasonable tradeoff between security and UX.
Lukasz Jagiello is an operations engineer at Wikia where he is hard working on saying NO. Between NO…
Tyler McMullen on Similarity Estimation Techniques from Rounding Algorithms
Tyler's Bio: Tyler McMullen is CTO at Fastly, where he’s responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s Instant Purging system, API, and Real-time Analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, he has experience in everything from web design to kernel development, and loathes all of it. Especially distributed systems.
Joel VanderWerf on The Krohn-Rhodes Theorem and Distributed Computing ( http://www.ams.org/journals/tran/1965-116-00/S0002-9947-1965-0188316-1/S0002-9947-1965-0188316-1.pdf )
The Krohn-Rhodes Theorem of 1962 surprised the math world by building arbitrary finite semigroups, and hence arbitrary finite state machines, out of flip-flops (registers with reset operations) and groups (permutations). The construction uses the wreath product, a coordinate system with unidirectional data flow among the factors, like addition with carry. How many group factors are needed? We don't know if this complexity question is decidable, after a half century of work. However, the effort …
Gilbert Bernstein on Marching Cubes (http://www.eecs.berkeley.edu/~jrs/meshpapers/LorensenCline.pdf )
Marching Cubes is one of the most important geometry algorithms for 3D volume visualization, 3D scanning/reconstruction, etc. It has the distinction of being the most cited graphics paper ever. And it's also definitely not the best algorithm you could implement for the problem it solves. Intriguing?
Gilbert Bernstein is a Ph.D. student in the department of Computer Science at Stanford University. His work focuses on a range of topics across Computer Graphics, HCI and Programming Languages, including Domain-Specific (Programming) Languages, Visual Tools for Artists…
Marios Assiotis on "Throttling Utilities in the IBM DB2 Universal Database Server" (http://www.nt.ntnu.no/users/skoge/prost/proceedings/acc04/Papers/0354_ThA01.3.pdf)
Marios is the CTO at TubiTV, the world's largest free streaming TV & movie library. His interests include simplifying complex systems, storage and low latency network i/o at scale. A transplant from Cyprus, he spends his free time trying to create the perfect all-American burger.
Matt Adereth from Two Sigma will present "A Scalable Bootstrap for Massive Data" (
Bryan Fink on "Fluctuations of Hi-Hat Timing and Dynamics in a Virtuoso Drum Track of a Popular Music Recording" (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902)
Bryan hacks distributed systems by day, and does almost anything else by night. His interests in percussion and computers began nearly coincidentally over twenty years ago in a small town on the Great Plains. The combination has led to him having strange thoughts about time and coordination.
****** We are closing the year with a PWL marathon ******
Talk #1 - Nathan Taylor on " Corey: An Operating system for Many Cores" (https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf) and "An analysis of Linux scalability to many cores" (https://pdos.csail.mit.edu/papers/linux:osdi10.pdf)
This is a story that spans two low-level systems papers. While on the surface it's all about how to make operating systems scale, it's also a story about how the same researchers can tackle a pr…
Tony Arcieri on Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud - http://theory.stanford.edu/~ataly/Papers/macaroons.pdf
Gareth Morgan on The Rendering Equation (
Jeff Carpenter presents Design Principles Behind Smalltalk by Daniel H. H. Ingalls. Jeff tells us: This a paper I love because it frames programming language design as a means to "provide computer support for the creative spirit in everyone." The paper describes the design principles the Learning Research Group at PARC discovered as they evolved the design of the Smalltalk language.
Stephen Tu presents: "Random features for large-scale kernel machines" (http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf) by Ali Rahimi and Ben Recht
Kernel methods in machine learning are a popular tool used to express richer function classes. These methods work via access to a kernel function, which can be thought of as a measure of similarity between two vectors. The standard method of optimization via kernels requires solving a program with decision variable the size of the number of data points. As dataset sizes grow, this becomes prohibitive.
This paper addresses this issue in a surprisingly practical way. Specifically, the authors show that by using a multiple of d random projections based on the Fouri…
David Brockman on "What uncovering a massive academic fraud taught me about how academia needs to change". A bit of background
Here's what David will be presenting: https://drive.google.com/file/d/0B_Qj0otlErJqVlJtMUhTU3ZiRzQ
David Broockman is an Assistant Professor of Political Economy at the Stanford Grad…
Kelsey Gilmore-Innis on "Information Escrows by Ian Ayres & Cait Unkovich" (http://repository.law.umich.edu/cgi/viewcontent.cgi?article=1091&context=mlr)
Kelsey is the Director of Technology at Sexual Health Innovations, where they are currently building Callisto (www.projectcallisto.org) based on Ayers & Unkovich's paper. She'll be talking about the joys and pitfalls of going from academic paper to production code which should ring familiar whether your source is Microsoft Research or the Michigan Law Review.
Ben Sigelman will present Span…
Kyle Isom on "Out of the Tarpit" by Ben Moseley and Peter Marks (http://shaffner.us/cs/papers/tarpit.pdf)
About the paper: Software inevitably grows complex as it grows in scope. Out of the Tar Pit puts forward some useful, actionable ideas for how we can manage complexity in our programs by looking at how we can reduce state in our systems.
Kyle is a systems engineer in the Bay area
Sargun will present the "Facebook Haystack" by Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel (http://static.usenix.org/legacy/events/osdi10/tech/full_papers/Beaver.pdf).
Sargun tells us: "It presents a simpl…
From Clark:"Through a CS practitioner's lens. It's pretty simple and powerful - I've used it repeatedly as a guiding principle when doing anything from UI to systems design. "
Clark Breyman is an principal engineer with Yammer's infrastructure team where he applies a combination of code, OCD, and war stories to make life easier for our product engineers. When not fighting entropy, he enjoys San Francisco, Lindy Hop, yoga and natural language processing.
Devon O'Dell on "Nonblocking Algorithms and Scalable Multicore Pro…
From Nathan: "This is one in a long line of papers advocating for a loosely-coupled, service-oriented operating system architecture, an argument that extends back to the dawn of systems research. Alternative OSs like microkernels have long been considered more stable and easier to reason about by the systems community, but the performance overhead that comes with running them means typically our OSs still resemble the ones from the '60s. The authors have spun the "it will be too slow" argument on its head by extrapolating hardware trends and adopting terminology from the distributed systems community, and end up making a compelling case for their design not only being the _only_ path forward,…
Matt Adereth on "The Mode Tree: A Tool for Visualization of Nonparametric Density Features"(http://adereth.github.io/oneoff/Mode%20Trees.pdf). From Matt: "We often look at summaries of univariate data using basic descriptive statistics like mean and standard deviation and visualizations like histograms and box plots. The Mode Tree is a powerful alternative visualization that reveals important details about our distributions that none of the standard approaches can show.
I particularly like this paper because it was really the by-product of some interesting algorithmic work in Computational Statistics. A lot of the techniques in this area are pretty math heavy and inaccessible, so I appreciated that they dedicated a paper to making a visualizat…
We have two 5-7 minute talks before our main presentation where a short summary of a paper or an exciting idea is shared. Anyone can sign up for a mini, just email us!
• Mini #1: Veronica Ray on Experimenting At Scale With Google Chrome’s SSL Warning (http://www.adrienneporterfelt.com/chi-ssl-experiment.pdf). From Veronica: "Whether we are surfing the web or developing applications, browser security affects us all. This study from 2014 is the first to demonstrate why real users disregard browser security warnings. Its conclusions about the impact of warning design are surprising and inspire reflection about what we as developers can do to keep our users safe."
• Mini #2: Garet…
Introducing PWL Mini!!
Starting this month we'll be opening up two 5-7 minute talk slots before our main talk. The idea is to share with the group a short summary of a paper or an idea that you are super excited about. Anyone can volunteer minis, just email us!
• Mini #1: Sargun Dhillon on VL2, a paper by Microsoft Research about computer networking. VL2 leverages several novel schemes in order to build full-bisection, highly-scalable, and decentralized datacenter networks in an economical fashion. These networks continue to support the layer 2 and flat addressing semantics. Many modern networks have been greatly influenced by this design.
Sargun Dhillon (@sargun) is highly interested in schemes for efficient, flexible computer networks to enable the next generatio…
Peter Bailis presents the "Managing Update Conflicts in Bayou, A Weakly Connected Replicated Storage System" paper by Doug Terry, Marvin Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer, and Carl H. Hauser in SOSP 1995. http://db.cs.berkeley.edu/cs286/papers/bayou-sosp1995.pdf
Peter tells us: "A perennial challenge in managing shared, mutable state in distributed systems is the ability to permit concurrent writes while maintaining some degree of application "correctness." Bayou is an excellent and early example of an "optimistic" strategy for handling the concurrent update problem. Many of the techniques in Bayou---such as dependency tracking, application-specific merge procedures, and log shipping---are increasingly popular, can be found in systems like Dynamo, and …
Leif tells us: " My favorite problems are always those with the highest ratio of difficulty in solving to difficulty in stating. The lowest common ancestor problem exemplifies this. It was first stated in 1973, and can be described to anyone in two sentences, or with one sentence and a picture. But it took 11 years before an optimal solution was discovered, and another 16 before an understandable and implementable solution with the same bounds was presented, in this paper, The LCA Problem Revisited<…
Kyle Kingsbury presents the On the attraction between two perfectly conducting plates paper.
In Kyle's words: "It's a very short physics paper--a sidenote, really, to work he considered much more important. Sixty years later, though, we consider this sidenote his defining work--it gave rise to one of the weirdest physics phenomena ever, and has only gotten *more* confusing since the original proof. There's a really interesting history behind it that reveals some of the sociology of academia. Plus it's just fucking fascinating physics.
It also doesn't require any math beyond, say, high school calculus, but illustrates what a rigorous formal argument looks like--and I'm well-prepared to teach the math and concepts to an audience without any mathematical expertise. I think it…
Armon Dadgar from HashiCorp presents the SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol paper by Abhinandan Das, Indranil Gupta, and Ashish Motivala.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/158
Armon has a passion for distributed systems and their application to real world problems. He is currently the CTO of HashiCorp, where he brings distributed systems into the world of DevOps tooling. He has worked on Terraform, Consul, and Serf at HashiCorp, and maintains the Statsite and Bloomd OSS projects as well.<…
Peter Alvaro from UC Berkeley will present the paper "Using Reasoning about Knowledge to Analyze Distributed Systems" by Joseph Halpern.
If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/147
Peter has kindly provided some references to help you get started
• Prior Halpern work on knowledge in DS:
Henry Robinson from Cloudera will present the paper "Impossibility of Distributed Consensus with One Faulty Process" by Fischer, Lynch and Patterson. This paper won the Dijkstra award given to the most influential papers in distributed computing so make sure you don't miss this!
Note that Henry will be focusing on the JACM version of the paper, not the PODS version. The JACM version is linked in the paper title above and you can also find it here.
If anyone really wants extra reading, you might consider the following:
Joel VanderWerf stops by to talk about Calvin. This time we have 3 relevant papers for this meetup!
• Calvin: Fast Distributed Transactions for Partitioned Database Systems, SIGMOD 2012 by Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
• Consistency Tradeoffs in Modern Distributed Database System Design, 2012 by Daniel J. Abadi.
• Modularity and Scalability in Calvin, IEEE 2013 by Alexander Thomson and Daniel J. Abadi.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Joel presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read …
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza. We will do our best to stream the event live (we will announce the link on this thread the day of the event)
After Bruce presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/…
Andy Gross from Twitter will present The Akamai Network: A Platform for High-Performance Internet Applications paper by Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Andy presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/68
Andy is a Staff Software Engineer at Twi…
Ryan Kennedy from Yammer Engineering will be kicking off our group by presenting the Dapper, a Large-Scale Distributed Systems Tracing Infrastructure paper by Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan and Chandan Shanbhag.
Doors open at 6:30 pm; the presentation will begin at 7:00 pm; and, yes, there will be beer and pizza.
After Ryan presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries). If you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/…