New York Chapter
What was the last paper within the realm of computing you read and loved? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research paper with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper (because of HARD MATH!). Otherwise, just come, listen, and discuss.
As the original Papers We Love chapter, we've been helping New Yorkers scratch their paper reading itch for most of 2014. We welcome everyone from the programming community for an evening of ideas, vibrant discussions and hanging out with your fellow travellers.
The New York Chapter meets monthly at different locations throughout the city. Keep an eye on our Meetup.com page to find out the latest address.
Our meets fill up fast, so please make sure to RSVP only if you plan on attending. You can find our schedule and RSVP here on Meetup.com.
Papers We Love has a Code of Conduct. Please contact one of the Meetup's organizers if anyone is not following it. Be good to each other and to the PWL community!
Sign-up: Please RSVP for meetings via Meetup.com
Contact: contact AT paperswelove DOT org
We are excited to host John Valois speaking on Wait-Free Synchronization
How do we implement data structures in a shared memory environment? The conventional answer is to use mutual exclusion, but this approach does not behave well when we encounter delays or failures in the critical section, forcing other processes to wait.
Wait-Free Synchronization by Maurice Herlihy (https://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf) explores an idea which ensures that operations complete in finite time regardless of the relative speeds of other processes. We’ll see a connection to the ubiquitous consensus problem and a framework for understanding what synchronization primitives are necessary and sufficient for implementing a given object, culminating in a method for implementing any object in a wait-free manner.
John Valois is a Managing Director at BlackRock wher…
We are excited to host Sun-Li Beatteay speaking on Guaranteeing Consensus in Distributed Systems with CRDTs
Consensus in distributed systems has been a debated topic every since programmers discovered they could run the same program on multiple machines. Researchers have been studying consensus for decades, resulting in numerous algorithms and white papers. Unfortunately, many of these algorithms are flawed and unreliabled.
However, in 2011, a team of researchers published a paper on a novel approach to distributed consensus using Conflict-free Replicated Data Types (https://hal.inria.fr/inria-00609399v1/document). This paper created quite a buzz as it showed that CRDTs were mathematically proven to guarantee consensus through "Strong Eventual Consistency." They also claimed to have solved the CAP conundrum.
This presentation dives into this seminal paper in order to answ…
We are excited to host Sarah Groff Hennigh-Palermo speaking on Exception Handling: Issues and a Proposed Notation from John B. Goodenough (https://web.eecs.umich.edu/~weimerw/[masked]/reading/goodenough-exceptions.pdf)
Errors and debugging are the bane of a programmer’s life — and the source of many jokes, Twitter rants, and midnight breakdowns. As programming matures as a practice, we continue to add different ways to avoid and address errors, but how did we get here to begin with?
Exception Handling: Issues and a Proposed Notation from John B. Goodenough (1975) details the needs and goals of an exception handling system and then gets specific with suggestions of syntax, including remedies to known issues in the system.
In this talk, we will take a look at the development of one approach to errors, — throwing and handling exceptions — as i…
We're happy to host John Feminella (http://jxf.me/), technologist and advisor, presenting on Impossibility of Distributed Consensus with One Faulty Process (https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) by Michael J. Fischer, Nancy A. Lynch and Michael S. Paterson.
If you think it's hard to get humans to agree on something, wait until you see how computers work! Computer scientists call this problem consensus, and when the computers involved are in an asynchronous environment, it's distributed consensus.
For about a decade prior to this paper, computer scientists had been debating whether distributed consensus was solvable in real environments. At the time, it was known that synchronous consensus, a weaker version of distributed consensus where everyone acts at the same time, was possible — and even better, it w…
**Please note the later start time! We will open doors at 7pm and begin at 7:30pm**
We are excited to host Elijah Ben Izzy speaking on Divide and Conquer Algorithms for Closest Point problems in Multidimensional Space (http://www.cs.unc.edu/techreports/76-103.pdf)
Given n points in k dimensional space, how can you efficiently find the pair that is closest together? It turns out that there’s an elegant, divide-and-conquer approach that utilizes a nifty trick. Jon Luis Bentley, a pioneer in the space of geometric algorithms, proposes this solution (and answers many more problems) in his original PhD thesis, written in 1976. The talk will focus in on his solution to the closest-pair problem, then discuss some general approaches to algorithm construction that he outlined when defending his thesis... all written with a type-writer.
Elijah is a quantitative software engineer at T…
**Please Note: We cannot accommodate +1s on your RSVP. Everyone must register and RSVP on their own.**
We are excited to host John Allspaw, Principal Researcher at Adaptive Capacity Labs, presenting Problem Detection (https://www.researchgate.net/publication/220579480_Problem_detection) by Gary Klein, Rebecca Pliske, Beth Crandall, and David Woods.
Published in 2005 in the journal Cognition, Technology and Work, "Problem Detection" explores the "process by which people first become concerned that events may be taking an unexpected and undesirable direction that potentially requires action." While this paper primarily centers on empirically rebutting previous theories of how problems are detected, it also puts forth many important observations and concepts for software engineering to pay close attention to. This talk won't just be a re-statement of the paper…
We are excited to host Dan Bentley, CEO of Windmill, presenting The Connection Machine: Computer Architecture for the New Wave (https://dspace.mit.edu/bitstream/handle/1721.1/14719/18524280-MIT.pdf) by Danny Hillis.
The Connection Machine is a computer that a time traveler borrowed from 2015 and accidentally returned to the wrong decade. How else to explain a 1985 computer with 65,536 processors? That's motivated by doing computer vision? We'll cover The Connection Machine (Danny Hillis's Ph.D. thesis) and the related "Data Parallel Algorithms" in discussing this provocative technological vision. The big question: how did the Connection Machine get so much right but end up a footnote?
Dan Bentley's a software engineer building Live Development as CEO of Windmill. He's opened for The Who, and has a check from Donald Knuth.
We're happy to host John Feminella (http://jxf.me/), technologist and advisor, presenting on Bitcoin: A Peer-to-Peer Electronic Cash System (https://bitcoin.org/bitcoin.pdf) by Satoshi Nakamoto on its 10-year anniversary.
The original Bitcoin paper was published by a pseudonymous individual named Satoshi Nakamoto on Halloween 2008, in the quiet recesses of a small cryptography mailing list, where it was mostly ignored. A couple of months afterwards, Satoshi published the original Bitcoin client software that implemented the ideas in the paper.
Ten years later, a lot has happened both about cryptocurrency, and a lot of money has changed hands. In this talk, we explore the core ideas laid out in the paper, the historical background around digital currencies, and how these ideas and history were implemented in the original Bitcoin client.
This June we're working with QCon New York (https://qconnewyork.com/) to bring you a series of 2 25-minute talks on 2 different papers! We're very lucky to be able and host Carmen Andoh, Golestan "Sally" Radwan, and PWL alumni, Matt Adereth.
Please register at https://www.eventbrite.com/e/community-night-papers-we-love-qcon-registration-46136097309 to attend this event, instead of doing so through the Meetup page.
Note: There will be no free food or drinks at this event, please grab something beforehand or afterward. We will be heading to O'Lunney's (https://www.yelp.com/biz/o-lunneys-new-york) afterward, located at 145 W 45 St for food and refreshments as well.
* Golestan "Sally" Radwan on What D…
We're thrilled to host Ben Linsay, engineer extraordinaire, presenting on HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) by Flajolet, et. al.
In addition to Ben's talk, Sandy Vanderbleek will be opening the event with a lightning talk on Peter Norvig's Correcting A Widespread Error in Unification Algorithms (https://norvig.com/unify-bug.pdf).
• Ben Linsay on HyperLogLog
This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes"), HyperLogLog performs a single pass over the data and produces an estimate of …
We're super excited to host Libby Kent, software engineer, distributed ledger programmer, and demonstrator of Monads Made Semi-Understandable (https://www.youtube.com/watch?v=J3Djb1VyzkM)! She'll be presenting on the Ethereum White Paper (http://blockchainlab.com/pdf/Ethereum_white_paper-a_next_generation_smart_contract_and_decentralized_application_platform-vitalik-buterin.pdf): A Next-Generation Smart Contract and Decentralized Application Platform (also available here (https://github.com/ethereum/wiki/wiki/White-Paper) with additional details), authored by Vitalik Buterin.
In 2009 bitcoin emerged as a peer-to-peer value transfer system,…
We're extremely glad to host Bonnie Eisenman, senior software engineer at Twitter and member of NYC Resistor, presenting on Multiphase Numerical Modeling of Dendritic Solidification for Jigsaw Puzzle Generation by J. Louis-Rosenberg, A. Resnick, and J. Rosenkrantz.
This paper explores expansions on the stylistic variety of handmade and artisanal jigsaw puzzles through the application of techniques from natural simulation. Typical jigsaw puzzle designs reflect the manufacturing constraints of die-cut, mass-production methods, but novel families of puzzle forms can be generated by applying a phase field approach to the simulation of dendritic solidification. Existing models of solidification with multiphase methods are extended to satisfy aesthetic and geometric considerations specific to jigsaw …
We're happy to host Suz Hinton, open-source hardware developer and cloud developer advocate at Microsoft, presenting on Accessible images (AIMS): a model to build self-describing images for assisting screen reader users by Ab Shaqoor Nengroo and K. S. Kuppusamy.
In my latest experimentations in 'Guerilla Accessibility,' I stumbled upon this paper a mere two weeks after publication and was excited about the explorations in it. The ideas built upon an unlikely hero of the anti-surveillance field delighted me. There are wide implementation gaps that remain unaddressed in this paper, and I took liberty on these when implementing the algorithms. I'll be sharing both the triumphs and pitfalls of this proposed technique of improving the accessibility of online web content.
We're glad to host Hannes Frederic Sowa, programmer at backtrace.io and has worked on the Linux Kernel networking stack (focusing on IPv6). Hannes will be presenting on BBR: Congestion-Based Congestion Control by N. Cardwell, et. al.
TCP congestion control has a large impact on perceived network performance (especially in terms of bandwidth and latency) and thus the Internet. Two major categories of congestion control algorithms had been explored, those using packet loss or packet delay feedback. Due to historic developments (and the development of packet switching hardware), packet-loss congestion control algorithms are commonly used today. We will discuss a congestion control scheme published by Google in 2017.
Hannes Frederic Sowa recently joined backtrace.io, and is onsite in Ne…
In addition to Camilo's talk, PWLNYC Organizer David Ashby will be opening the event with a lightning talk on the Secure Hash Standard, specifically SHA256, called SHAmwow: Poorly Re-implementing SHA256 for Fun and Profit.
• Camilo Aguilar on the rsync algorithm:
Modern computers are very powerful. These days, mobile phones are packed with multi core CPUs and even GPUs. Despite these advances in hardware, internet connections in most parts of the world are still surprisingly slow and unreliable. This creates a …
We're super excited to host Jessie Frazelle, software engineer at Microsoft, contributor to RunC and Golang, has served as Maintainer of Docker, and is the Keyser Söze of container security. She'll be presenting on SCONE: Secure Linux Containers with Intel SGX by Arnautov, et. al.
In addition to Jessie's talk, PWLNYC Organizer David Ashby will be opening the event with a lightning talk on the Secure Hash Standard, specifically SHA256, called SHAmwow: Poorly Re-implementing SHA256 for Fun and Profit.
• Jessie Frazelle on SCONE:
Containers are the latest infrastructure trend. In 2016, the SCONE (
We're happy to be hosting Gershom Bazerman, software developer at S&P/Capital IQ, author of various Haskell libraries, and organizer/founder of the NY Haskell Users Group, who'll be presenting on Homological Computations for Term Rewriting Systems by Philippe Malbos and Samuel Mimram.
• Gershom Bazerman on Homological Computations for Term Rewriting Systems:
In 1987, C. Squier wrote "Word problems…
We're happy to be hosting Wes Chow, CTO of Chartbeat, who'll be presenting on Off-the-Record Communication, or, Why Not To Use PGP by Borisov, Goldberg, and Brewer.
Your intrepid reporter goes to a private location and meets with a key source who wishes to remain anonymous and off the record. The reporter understands that all information she learns from the source must be validated elsewhere and not directly quoted (private), that the source is who he says he is (authenticated), and that should their conversation become public they could both plausibly deny having said any of the recorded words (repudiable). How do we construct a digital version of an IRL meeting?
Nikita Borisov, Ian Goldberg, and Eric Brewer devise a communication protocol in Off-the-Record Communication, or Why Not To Use PGP that provides all of the above mentioned properties, as we…
This is going to be another special one! We're working with the QCon New York team to put together a series of 4 PWL "mini" (15~20-minute) presentations by 4 wonderful speakers: John Langford, Matt Adereth (@adereth), Charity Majors (@mipsytipsy), and Gwen Shapira (@gwenshap) that will be announced very soon. Many of whom are also speaking at QCon!
John Langford on Making Contextual Decisions with Low Technical Debt:
Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual informa…
We're thrilled to be hosting Daniel Doubrovkine, CTO of Artsy, who'll be presenting on Simon Parson's Auctions and bidding: A guide for computer scientists.
In addition to Daniel's talk, Sophia Gold will be opening the event with a survey-oriented lightning talk on An Intellectual History of Automatic Differentiation.
• Daniel Doubrovkine on Auctions and bidding: A guide for computer scientists:
There is a veritable menagerie of auctions — single dimensional, multi-dimensional, single sided, double sided, first price, second price, English, Dutch, Japanese, sealed bid — and these have been extensively discussed and anal…
We're delighted to host William E. Byrd, Research Assistant Professor at the University of Utah's School of Computing. William is co-author of The Reasoned Schemer with Daniel P. Friedman and Oleg Kiselyov, co-designer of miniKanren and Barliman, a prototype interactive editor for exploring program synthesis. He'll be discussing what he considers to be the most beautiful program every written and much of the research and work behind it.
We're terrifically excited to host Kiran Bhattaram (https://kiranbot.com), software engineer at Stripe (https://stripe.com) and writer of amazing posts and paper reports (https://kiranbot.com/tags/computers) about computing and sewing (https://kiranbot.com/tags/sewing/)! She'll be presenting on a survey of work related to failure detectors for distributed systems.
The problem of consensus is central to many distributed systems algorithms. Failure detectors are central to the way we think about consensus algorithms. In a fully asynchronous system, the FLP impossibility result (https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) shows tha…
We're back in 2017 with Erich Ess, lead engineer on Jet.com's Core Platform team, presenting on foundational work to visualize fluid motion: Imaging Vector Fields Using Line Integral Convolution by Brian Cabral and Leith Leedom.
Line Integral Convolution is one of the most intuitive data visualization techniques around. It's dropping paint in a river to see how the current is flowing: to visualize a vector field simply take an image and have the vector field smear the colors. The result is a powerful alternative to using arrows or stream lines. And while the intuition is very straightforward, the actual mathematics that power the technique are very complex.
Erich Ess (
We're back with Wil Yegelwel, an engineer on Two Sigma's Compute team, presenting on 3 related, foundational graphics papers on the topic(s) of illumination and rendering: Continuous shading of curved surfaces, Illumination for computer generated pictures, and The Rendering Equation.
Illumination in computer graphics deals with calculating the color of each pixel on the screen when trying to render a photorealistic scene. The problem is that generating increasingly realistic renderings requires a lot of processing and so a tradeoff must be made between compute time and image quality. We will first look at two foun…
It's with great pleasure to announce that we'll be hosting Elizabeth Ramirez, a senior software engineer at The New York Times and M.S. candidate in Applied Mathematics at Columbia University. She'll be presenting on a classic: A New Approach to Linear Filtering and Prediction Problems by Rudolf Kálmán. Kálmán, who passed away in July, was a luminary who shaped the field of modern control theory.
It's been a long time coming, but we have the fantastic David Nolen, lead developer of Clojurescript (among other things) presenting at PWLNYC. He'll be presenting Parsing with Derivatives by Matthew Might, David Darais, & Daniel Spiewak. It's a heralded paper that has a super interesting follow-up (On the Complexity and Performance of Parsing with Derivatives), has influenced clojure.spec, and this helpful video!
We present a functional ap…
PWLNYC is extremely excited to be hosting Deniz Altınbüken, a Ph.D. candidate in Distributed Systems at Cornell University. We were first introduced to her work upon seeing her amazing Ricon 2015 talk and reading through Paxos: Made Moderately Complex. She's travelling from Cornell to join us in talking about the topic of Chain Replication (CR) and their new and improved protocol for it, which has a formally defined end-to-end specification.
This is going to be a special one! We're working with the QCon New York team to put together a series of 4 PWL "mini" (15~20-minute) presentations by 4 wonderful speakers, all of whom are also speaking at QCon! So, we're extremely happy to welcome Evelina Gabasova (@evelgab), Eric Brewer (@eric_brewer), Ines Sombra (@randommood), and Caitie McCaffrey (@caitie) to PWLNYC!
Also, QCon New York is offering $100 off their conference ticket for our group. Just use the code "paperswelove" when registering!
- Evelina Gabasova presenting on
The paper I'm going to discuss is the result of what happens when people with backgrounds in mathematics, psychology, and artificial intelligence (Feltovich and Bradshaw) get together to ask questions about how teams operate alongside the originators of modern decision-making and cognitive systems engineering research (Klein and Woods).
The concepts outlined in the paper have provided frames and directions in designing tools and environments where successful work requires multiple actors (whether they are people or software agents!) to succeed. This seminal paper takes a deep dive into not just people and teamwork, but what comprises the…
We're **crazy** excited to have Bryan Cantrill, CTO of Joyent, formerly of Sun Microsystems, presenting on Jails: Confining the omnipotent root. by Poul-Henning Kamp and Robert Watson and Solaris Zones: Operating System Support for Consolidating Commercial Workloads by Dan Price and Andy Tucker!
You can also catch Bryan presenting at the NYC Container Summit on February 10th, which also is hosting an advanced technical track, which includes hand-on tutorials! Also, watch this amazing illumos presentation by Bryan in 2011!
Of all the ways to manipulate a 3D mesh, the “push/pull” technique popularized by SketchUp is one of the most approachable and fun. PushPull++ is a recent paper that elaborates on the technique, cleaning up a lot of edge cases and unlocking new features, using wonderfully straightforward math. The potential for 3D modeling tools or procedural mesh generation APIs built on these simple ideas is very exciting.
The paper presents the technique and the tool that the authors built. I will focus on the technique, as that’s the part I…
In 1984 Leslie Lamport began to observe the glitch problem  occurs in everyday life. Realizing this phenomenon had not been discussed by psychologists of the day, he set out to describe his observations using the classical formalization of Buridan's ass. Lamport initially failed to have this paper published in various scientific journals, being rejected on grounds of superficiality. It wasn't until 2011, when a reader suggested he resubmit to Foundations of Physics, that the paper was eventually published.
Those familiar with the works of Lamport know him as a logician, his life's work has pushed forward the state of…
It's with great pleasure to announce that we'll have Tomas Petricek, PhD student at University of Cambridge, functional programer, and F# enthusiast, presenting on the 1975 book -- Against Method: Outline of an Anarchist Theory of Knowledge by Paul Feyerabend.
This will be a little different! There's no open-access PDF version of the book around, but you can find at bookstores, Amazon, or on the Internet... or so I've heard.
How is computer science research done? What we take for granted and what we question? And how do theories in computer science tell us something about the real world? Those are some of the questions that may inspire computer scientist like me (and you!) to look into philosophy of science. I’ll prese…
We're elated to have Ryan Zezeski, kernel hacker and Baltimorean, presenting on The Slab Allocator: An Object-Caching Kernel Memory Allocator by Jeff Bonwick.
In 1994 Jeff Bonwick presented his Slab Allocator at the USENIX SummerTechnical Conference. Over two decades later Google reports 35-thousand results for "slab allocator". CiteSeerX reports 93 citations. And many modern kernel allocators are based on his design, such as illumos, Linux, and FreeBSD. Jeff's design, along with the original paper, remains just as relevant today as it was 21 years ago. Join me as I tell the tale of the Slab Allocator: where it came from, what it is, why it's important, and where it's going.
It's going to be a special evening. After taking the month of July off, we're coming back with the original Papers We Love speaker, Michael R. Bernstein, co-host of Beats, Rye, & Types and Code Climate hustler, and he’s very excited to be back, presenting on Propositions as Types by Philip Wadler and showcasing some work from The Little Prover by Friedman and Eastlund!
I’ll (Michael) be talking about Philip Wadler’s paper "Propositions as Types," which starts ou…
We're happy to have Jason Ganetsky, tech lead of storage for Google Cloud Pub/Sub, presenting on Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages by Simon Marlow and Simon Peyton Jones.
Higher-order languages that encourage currying are typically implemented using one of two basic evaluation models: push/enter or eval/apply. Implementors use their intuition and qualitative judgements to choose one model or the other. Our goal in this paper is to provide, for the first time, a more substantial basis for this choice, based on our qualitative and quantitative experience of implementing both models in a state-of-the-art compiler for Haskell.
Our conclusion is simple, and contradicts our initial intuition: compiled implementations should use eval/apply.
We're thrilled to have Samy Al Bahra, co-founder of Backtrace and founder of Concurrency Kit, presenting on Making Lockless Synchronization Fast: Performance Implications of Memory Reclamation by Hart, McKenney, and Brown.
Multicore systems are ubiquitous but modern concurrent programming techniques still do not see wide-spread adoption. Most concurrent software (developed in low-level languages) still relies on error-prone and unscalable memory management techniques for correctness despite the introduction of superior methods over 30 years ago. Safe memory reclamation allows for performant and robust memory management that is also suitable for advanced concurrent programming techniques such as non-blocking synchronization. If properly used, safe memory reclamation techniques allow improved performance and simpli…
We're so excited to have Neha Narula, a PhD student in PDOS, the Parallel and Distributed Operating Systems group at MIT, and an amazing speaker and researcher, presenting on The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors by Clements, Kaashoek, Zeldovich, Morris, and Kohler.
Moore's law is over, or at least, we won't be making programs go faster by running on faster processors, but instead by parallelizing our code to use more of them. Reasoning about concurrent code is difficult; but it's also very hard to understand whether your design has latent scalability bottlenecks until you can actually run it on many cores. And what if the problem is in your interface, instead of just the implementa…
Strachey's lectures on "Fundamental Concepts in Programming Languages" provided an extremely broad survey of core issues in programming language design that provided much of the terminology we use today, including definitions of the kinds of polymorphism and the kinds of expressions we see in programming languages. Published as a paper many years later, Strachey's lectures provide an especially readable overview of programming languages concepts.
John Myles White (
We're riveted to have Andrew Turley, lead software engineer on the platform team atTheLadders, presentingIncremental Mature Garbage Collection Using the Train Algorithm by Jacob Seligmann & Steffen Grarup.
Automatic garbage collection has spared programmers from an entire class of programming errors related to memory leaks and attempting to access objects that were incorrectly freed. As programs have grown in size and complexity, so have the systems that manage garbage collection. Each algorithm makes a different set of tradeoffs between factors such as the space used by objects, the space used by bookkeeping, the number of unused objects (garbage) that remain uncollected, the time spent in allocation, and the time spent in
We're extremely thrilled to host Sam Tobin-Hochstadt, Assistant Professor in the School of Informatics and Computing at Indiana University, presenting Composable and Compilable Macros by Matthew Flatt of the University of Utah.
"Composable and Compilable Macros" introduces the Racket module system, which addresses the following problem: When you have macros that run programs at compile-time, how does this interact with separate compilation and ahead-of-time compilation. The paper introduces "phases", which enable Racket to behave the same regardless of when and how you compile your program. It also introduces the idea of writing different modules in different languages, which is now used for systems like Typed Racket.
A few related papers:
We're thrilled to have Jeff Larson, data editor Data Editor at ProPublica, presenting On the resemblance and containment of documents by Andrei Z. Broder.
Increasingly Journalists are dealing with ever larger document dumps, and in order to find interesting stories in these troves, they have to cluster the documents to separate the wheat from the chaff. The size of these dumps often means that traditional algorithms either are too complex and take too long, or they rely on apriori constants like the number of clusters to search for.
Jeff Larson will present a novel algorithm called minhashing that was invented at AltaVista in order to loosely cluster similar documents. The paper " On the resemblance and containment of documents" relies on a hash collisions to create document fingerprints and shows that documents can be clustered in linear time witho…
Our lives now run on software. Bugs are becoming not just annoyances for software developers, but sources of potentially catastrophic failures. A careless programmer mistake could leak our social security numbers or crash our cars. While testing provides some assurance, it is difficult to test all possibilities in complex systems--and practically impossible in concurrent systems. For the critical systems in our lives, we should demand mathematical guarantees that the software behaves the way the programmer expected.
A single paper influenced much of the work towards providing these mathematical guarantees. C.A.R. Hoare’s seminal 1969 paper “An Axiomatic Basis for Computer Programming” introduces a method of reasoning about pr…
We're excited to have Camille Fournier , CTO at Rent the Runway, presenting on The Chubby lock service for loosely-coupled distributed systems by Mike Burrows.
Distributed consensus is often discussed in terms of algorithms: Paxos, ZAB, RAFT, etc. But while the algorithms may be more or less mind-bending, for me the more interesting aspect of distributed consensus is creating systems that support it for the general use case. This paper, on Google's Chubby lock service, is the story of happens when a system stops being a polite theory, and starts getting real-world use.
To anyone who has worked in depth as a distributed systems engineer, Chubby is a beautiful paper. It is not a paper about algorithms and their limits, or a toy fringe system created by grad students to test a hypothesis. It i…
We're excited to have Peter Burka , member of the software research team at Two Sigma, presenting on Crossing the Gap from Imperative to Functional Programming through Refactoring by Alex Gyori, Lyle Franklin, Danny Dig, and Jan Lahoda.
The introduction of lambdas to Java 8 might be the most significant change to the Java language since Java 2 was released in 1998. Lambdas and the accompanying functional operations like map and filter promise to allow Java programmers to write clearer, simpler code, and to take better advantage of parallelism.
While developers of new code will be able to start using the features immediately, what should we do with the billions of lines of code that have already been written? This paper proposes that we can automatically translate the existing body of Java code to make use of the new features. This imp…
My favorite problems are always those with the highest ratio of difficulty in solving to difficulty in stating. The lowest common ancestor problem exemplifies this. It was first stated in 1973, and can be described to anyone in two sentences, or with one sentence and a picture. But it took 11 years before an optimal solution was discovered, and another 16 before an understandable and implementable solution with the same bounds was presented, in this paper, The LCA Problem Revisited. This problem is furthermore satisfying because its bounds are so tight: pre-processing takes as long as just reading…
We're happy to have Erik Hinton, developer at the New York Times, presenting on The Derivative of a Regular Type is its Type of One-Hole Contexts by Conor McBride.
Papers are generally loved for one of two reasons. Either the paper is foundational, siring a lineage of important research, or the paper is useful, guiding readers toward clever optimizations, fault-tolerant solutions, and non-intuitive hacks. "The Derivative of a Regular Type is its Type of One-Hole Contexts" is neither. Only a few papers build on McBride's work and the conclusions of the paper, though promising, haven't yet found any real employ.
This paper is lovable, fun, and important because it is a radical thought experiment in the limits of abstraction. The paper poses the question: we call data types "algebraic", so can we "do calculus" on them? Surely, the…
In June, we have the pleasure of hearing Aysylu Greenberg, Software Engineer at Google and maintainer of the Clojure library Loom, speaking on the paper "One VM to Rule Them All" by Thomas Wuerthinger, Christian Wimmer, et al.
The paper explains how you can write an interpreter and get an optimizing just-in-time (JIT) compiler for free. This enables language designers to focus on features without worrying about the complexities of compiler optimizations and code generation. This paper presents a Java Virtual Machine (JVM) that allows the application to control the JIT compiler behavior at runtime. We'll…
We're ecstatic to have Chas Emerick presenting A comprehensive study of Convergent and Commutative Replicated Data Types by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski (2011).
Conflict-free Replicated Data Types (CRDTs) are a formalism for providing practical data and programming primitives for use in distributed systems applications without necessitating expensive (and sometimes impractical) consensus mechanisms. Their key characteristic is that they provide conflict-free "merging" of distributed concurrent updates given only the weak guarantees of eventual consistency.
While this paper did not coin the term 'CRDT', it was the first to provide a comprehensive treatment of their definit…
When you need to execute code on a cluster of machines, deciding which machine should run that code becomes a complex problem, known as scheduling. We're all familiar with routing problems, such as the recent RapGenius incident. It turns out that simple improvements to randomized routing can dramatically improve the performance! Sparrow is a distributed scheduling algorithm for low latency, high throughput workloads. We'll review the Sparrow algorithm, and learn the tricks that they used. Then, we'll discuss other applications of Sparrow, besides the big-data map-reduce application it was created for…
Some great papers embody insights, others package up those insights into digestible bites. "Programing with Algebraic Effects and Handlers" is the later sort of great paper. After two decades of fundamental research in to the nature of computation, a lot of mysterious ideas in computer science such as continuations and exception handling finally made sense to a number of mathematically inclined geniuses. Bauer and Pretnar's Eff programming language cuts right through the heart of the theory in a way that makes sense to anybody who has ever written a functional program. This paper uses the Eff language to explore…
Doors open at 7 pm; the presentation will begin at 7:30 pm; and, yes, there will be beer and pizza.
After Michael presents the paper, we will open up the floor to discussion and questions.
We hope that you'll read the paper before the Meetup (and if you don't, no worries, be happy now!), and if you have any questions, thoughts, or related information, please visit our *github-thread* on the matter: https://github.com/papers-we-love/papers-we-love/issues/1.
Additionally, if you have any papers you want t…