A complete backup of the-paper-trail.org

More Annotations

Favourite Annotations

Text

THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCH 2020 (June 22, 2020 ) Network Load Balancing with Maglev(April 18, 2020 ) Gray Failures(April 6, 2020 ) Availability in AWS' Physalia2018 (November 3, 2018 ) Beating hash tables with trees?The ART-ful radix trie (September 26, 2018 ) Outperforming hash-tables with MICA OVERVIEW OF ALL PAGES WITH CATEGORIES, ORDERED BY CATEGORY Exactly-once or not, atomic broadcast is still impossible in Kafka - or anywhere Make any algorithm lock-free with this one crazy trick The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google ByteArrayOutputStream is really, really slow sometimes in JDK6 On Raft, briefly Links Something a bit different: translations of classic mathematical texts (!)

READING LIST

Distributed Systems Service Fabric: A Distributed Platform for Building Microservices in the Cloud - Kakivaya et. al., EuroSys 2018 Gray Failure: The Achilles’ Heel of Cloud-Scale Systems - Huang et. al., HotOS 2017 Cache-aware load balancing of data center applications - Archer et. al., VLDB 2019 Slicer: Auto-Sharding for Datacenter Applications - Adya et. al., OSDI 2016 [notes AVAILABILITY IN AWS' PHYSALIA Brooker et. al., NSDI 2020 “Physalia: Millions of Tiny Databases” Some notes on AWS’ latest systems publication, which continues and expands their thinking about reducing the effect of failures in very large distributed systems (see shuffle sharding as an earlier and complementary technique for the same kind of problem). Physalia is a configuration store for AWS’ Elastic-Block Storage

THE CAP FAQ

0. What is this document? No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points

of disagreement.

PAPER NOTES: ANTI-CACHING Anti-Caching: A New Approach to Database Management System Architecture DeBrabant et. al., VLDB 2013 The big idea Traditional databases typically rely on the OS page cache to bring hot tuples into memory and keep them there. This suffers from a number of problems: No control over granularity of caching or eviction (so keeping a tuple in memory might keep all the tuples in its page as THE ELEPHANT WAS A TROJAN HORSE: ON THE DEATH OF MAP Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise. This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data A BRIEF TOUR OF FLP IMPOSSIBILITY

ÉTALE COHOMOLOGY

Étale Cohomology 3 To give a ﬂavour of the issues involved here is a bald statement: If Xis a suitable variety over some ﬁnite ﬁeld F qthen the following zeta function on Xcan be deﬁned: BYTEARRAYOUTPUTSTREAM IS REALLY, REALLY SLOW SOMETIMES IN TLDR: Yesterday I mentioned on Twitter that I’d found a bad performance problem when writing to a large ByteArrayOutputStream in Java. After some digging, it appears to be the case that there’s a bad bug in JDK6 that doesn’t affect correctness, but does cause performance to nosedive when a ByteArrayOutputStream gets large. This

post explains why.

READING LIST

THE CAP FAQ

of disagreement.

ÉTALE COHOMOLOGY

post explains why.

OVERVIEW OF ALL PAGES WITH CATEGORIES, ORDERED BY CATEGORY Exactly-once or not, atomic broadcast is still impossible in Kafka - or anywhere Make any algorithm lock-free with this one crazy trick The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google ByteArrayOutputStream is really, really slow sometimes in JDK6 On Raft, briefly Links Something a bit different: translations of classic mathematical texts (!) PAPER NOTES: ANTI-CACHING Anti-Caching: A New Approach to Database Management System Architecture DeBrabant et. al., VLDB 2013 The big idea Traditional databases typically rely on the OS page cache to bring hot tuples into memory and keep them there. This suffers from a number of problems: No control over granularity of caching or eviction (so keeping a tuple in memory might keep all the tuples in its page as THE GOOGLE FILE SYSTEM It’s been a little while since my last technically meaty update. One system that I’ve been looking at a fair bit recently is Hadoop, which is an open-source implementation of Google’s MapReduce. For me, the interesting part is the large-scale distributed filesystem on which it runs called HDFS. It’s well known that HDFS is based heavily on its Google equivalent. CONSENSUS PROTOCOLS: THREE-PHASE COMMIT Consensus Protocols: Three-phase Commit. Last time we looked extensively at two-phase commit, a consensus algorithm that has the benefit of low latency but which is offset by fragility in the face of participant machine crashes. In this short note, I’m going to explain how the addition of an extra phase to the protocol can shore

things up a

ON RAFT, BRIEFLY

On Raft, briefly. Raft is a new-ish consensus implementation whose great benefit, to my mind it, is its applicability for real systems. We briefly discussed it internally at Cloudera, and I thought I’d share what I contributed, below. There’s an underlying theme here regarding the role of distributed systems research in practitioners

ÉTALE COHOMOLOGY

Étale Cohomology 3 To give a ﬂavour of the issues involved here is a bald statement: If Xis a suitable variety over some ﬁnite ﬁeld F qthen the following zeta function on Xcan be deﬁned: CONSENSUS PROTOCOLS: PAXOS Consensus on Transaction Commit. This short paper by Lamport and Jim Gray demonstrates that 2PC is a degenerate version of Paxos that tolerates zero failures. This is a readable introduction to Paxos and motivates, like I have done, the protocol as

COLUMNAR STORAGE

You’re going to hear a lot about columnar storage formats in the next few months, as a variety of distributed execution engines are beginning to consider them for their IO efficiency, and the optimisations that they open up for query execution. In this post, I’ll explain why we care so much about IO efficiency and show how columnar storage - which is a simple idea - can drastically improve FLP AND CAP AREN'T THE SAME THING An interesting question came up on Quora this last week. Roughly speaking, the question asked how, if at all, the FLP theorem and the CAP theorem were related. I’d thought idly about exactly the same question myself before. Both theorems concern the impossibility of solving fairly similar fundamental distributed systems problems in what appear to be fairly similar distributed systems THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCH 2020 (June 22, 2020 ) Network Load Balancing with Maglev(April 18, 2020 ) Gray Failures(April 6, 2020 ) Availability in AWS' Physalia2018 (November 3, 2018 ) Beating hash tables with trees?The ART-ful radix trie (September 26, 2018 ) Outperforming hash-tables with MICA

READING LIST

GRAY FAILURES

Huang et. al., HotOS 2017 “Gray Failure: The Achilles Heel of Cloud-Scale Systems” Detecting faults in a large system is a surprisingly hard problem. First you have to decide what kind of thing you want to measure, or ‘observe’. Then you have to decide what pattern in that observation constitutes a sufficiently worrying situation (or ‘failure’) to require mitigation. AVAILABILITY IN AWS' PHYSALIA Brooker et. al., NSDI 2020 “Physalia: Millions of Tiny Databases” Some notes on AWS’ latest systems publication, which continues and expands their thinking about reducing the effect of failures in very large distributed systems (see shuffle sharding as an earlier and complementary technique for the same kind of problem). Physalia is a configuration store for AWS’ Elastic-Block Storage

THE CAP FAQ

of disagreement.

THE-PAPER-TRAIL.ORG

THE GOOGLE FILE SYSTEM THE ELEPHANT WAS A TROJAN HORSE: ON THE DEATH OF MAP Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise. This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data BYTEARRAYOUTPUTSTREAM IS REALLY, REALLY SLOW SOMETIMES IN TLDR: Yesterday I mentioned on Twitter that I’d found a bad performance problem when writing to a large ByteArrayOutputStream in Java. After some digging, it appears to be the case that there’s a bad bug in JDK6 that doesn’t affect correctness, but does cause performance to nosedive when a ByteArrayOutputStream gets large. This

post explains why.

READING LIST

GRAY FAILURES

THE CAP FAQ

of disagreement.

THE-PAPER-TRAIL.ORG

post explains why.

READING LIST

Distributed Systems Service Fabric: A Distributed Platform for Building Microservices in the Cloud - Kakivaya et. al., EuroSys 2018 Gray Failure: The Achilles’ Heel of Cloud-Scale Systems - Huang et. al., HotOS 2017 Cache-aware load balancing of data center applications - Archer et. al., VLDB 2019 Slicer: Auto-Sharding for Datacenter Applications - Adya et. al., OSDI 2016 [notes THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCH Writing about distributed systems, databases and research papers from SOSP, ATC, NSDI, OSDI, EuroSys and others THE GOOGLE FILE SYSTEM It’s been a little while since my last technically meaty update. One system that I’ve been looking at a fair bit recently is Hadoop, which is an open-source implementation of Google’s MapReduce. For me, the interesting part is the large-scale distributed filesystem on which it runs called HDFS. It’s well known that HDFS is based heavily on its Google equivalent.

DIST SYS SLACK

Want to chat about distributed systems and databases? Come join over 2000 like-minded individuals at the dist-sys slack!. Click here for an

invite.

things up a

CONSENSUS PROTOCOLS: TWO-PHASE COMMIT For the next few articles here, I’m going to write about one of the most fundamental concepts in distributed computing - of equal importance to the theory and practice communities. The consensus problem is the problem of getting a set of nodes in a distributed system to agree on something - it might be a value, a course of action or a decision. Achieving consensus allows a FLP AND CAP AREN'T THE SAME THING An interesting question came up on Quora this last week. Roughly speaking, the question asked how, if at all, the FLP theorem and the CAP theorem were related. I’d thought idly about exactly the same question myself before. Both theorems concern the impossibility of solving fairly similar fundamental distributed systems problems in what appear to be fairly similar distributed systems THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCHTHE PAPER TRAIL BOOKTHE PAPER TRAIL HADDONFIELDDELTARUNE PAPER TRAIL COMICPAPER TRAIL DELTARUNEPAPER TRAIL HADDONFIELD NJPAPER TRAIL RHINEBECK NY 2020 (June 22, 2020 ) Network Load Balancing with Maglev(April 18, 2020 ) Gray Failures(April 6, 2020 ) Availability in AWS' Physalia2018 (November 3, 2018 ) Beating hash tables with trees?The ART-ful radix trie (September 26, 2018 ) Outperforming hash-tables with MICA

READING LIST

GRAY FAILURES

THE CAP FAQ

of disagreement.

PAPER NOTES: ANTI-CACHING Anti-Caching: A New Approach to Database Management System Architecture DeBrabant et. al., VLDB 2013 The big idea Traditional databases typically rely on the OS page cache to bring hot tuples into memory and keep them there. This suffers from a number of problems: No control over granularity of caching or eviction (so keeping a tuple in memory might keep all the tuples in its page as THE GOOGLE FILE SYSTEM THE ELEPHANT WAS A TROJAN HORSE: ON THE DEATH OF MAP Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise. This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data

ÉTALE COHOMOLOGY

post explains why.

THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCHTHE PAPER TRAIL BOOKTHE PAPER TRAIL HADDONFIELDDELTARUNE PAPER TRAIL COMICPAPER TRAIL DELTARUNEPAPER TRAIL HADDONFIELD NJPAPER TRAIL RHINEBECK NY 2020 (June 22, 2020 ) Network Load Balancing with Maglev(April 18, 2020 ) Gray Failures(April 6, 2020 ) Availability in AWS' Physalia2018 (November 3, 2018 ) Beating hash tables with trees?The ART-ful radix trie (September 26, 2018 ) Outperforming hash-tables with MICA

READING LIST

GRAY FAILURES

THE CAP FAQ

of disagreement.

PAPER NOTES: ANTI-CACHING Anti-Caching: A New Approach to Database Management System Architecture DeBrabant et. al., VLDB 2013 The big idea Traditional databases typically rely on the OS page cache to bring hot tuples into memory and keep them there. This suffers from a number of problems: No control over granularity of caching or eviction (so keeping a tuple in memory might keep all the tuples in its page as THE GOOGLE FILE SYSTEM THE ELEPHANT WAS A TROJAN HORSE: ON THE DEATH OF MAP Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise. This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data

ÉTALE COHOMOLOGY

post explains why.

READING LIST

Distributed Systems Service Fabric: A Distributed Platform for Building Microservices in the Cloud - Kakivaya et. al., EuroSys 2018 Gray Failure: The Achilles’ Heel of Cloud-Scale Systems - Huang et. al., HotOS 2017 Cache-aware load balancing of data center applications - Archer et. al., VLDB 2019 Slicer: Auto-Sharding for Datacenter Applications - Adya et. al., OSDI 2016 [notes THE PAPER TRAIL: DISTRIBUTED SYSTEMS AND DATABASE RESEARCH Writing about distributed systems, databases and research papers from SOSP, ATC, NSDI, OSDI, EuroSys and others THE GOOGLE FILE SYSTEM It’s been a little while since my last technically meaty update. One system that I’ve been looking at a fair bit recently is Hadoop, which is an open-source implementation of Google’s MapReduce. For me, the interesting part is the large-scale distributed filesystem on which it runs called HDFS. It’s well known that HDFS is based heavily on its Google equivalent.

DIST SYS SLACK

Want to chat about distributed systems and databases? Come join over 2000 like-minded individuals at the dist-sys slack!. Click here for an

invite.

CONSENSUS PROTOCOLS: THREE-PHASE COMMIT Consensus Protocols: Three-phase Commit. Last time we looked extensively at two-phase commit, a consensus algorithm that has the benefit of low latency but which is offset by fragility in the face of participant machine crashes. In this short note, I’m going to explain how the addition of an extra phase to the protocol can shore

things up a

A BRIEF TOUR OF FLP IMPOSSIBILITY A Brief Tour of FLP Impossibility. One of the most important results in distributed systems theory was published in April 1985 by Fischer, Lynch and Patterson. Their short paper ‘Impossibility of Distributed Consensus with One Faulty Process’, which eventually won the Dijkstra award given to the most influential papers in distributed CONSENSUS PROTOCOLS: TWO-PHASE COMMIT For the next few articles here, I’m going to write about one of the most fundamental concepts in distributed computing - of equal importance to the theory and practice communities. The consensus problem is the problem of getting a set of nodes in a distributed system to agree on something - it might be a value, a course of action or a decision. Achieving consensus allows a FLP AND CAP AREN'T THE SAME THING An interesting question came up on Quora this last week. Roughly speaking, the question asked how, if at all, the FLP theorem and the CAP theorem were related. I’d thought idly about exactly the same question myself before. Both theorems concern the impossibility of solving fairly similar fundamental distributed systems problems in what appear to be fairly similar distributed systems Toggle navigation The Paper Trail

* Blog

* Categories

* Reading List

* Dist. sys slack

* The CAP FAQ

THE PAPER TRAIL

------------------------- Distributed systems and data processing AVAILABILITY IN AWS' PHYSALIA __ Posted on April 6, 2020 | __ 5 minutes (898 words) _Brooker et. al., NSDI 2020_ “Physalia: Millions of Tiny

Databases”

Some notes on AWS’ latest systems publication, which continues and expands their thinking about reducing the effect of failures in very large distributed systems (see shuffle sharding as an earlier and complementary technique for the same kind of

problem).

Physalia is a configuration store for AWS’ Elastic-Block Storage (i.e. network-attached disks). EBS disks are replicated using chain replication, but the configuration of the replication chain needs to be stored somewhere - enter Physalia. BEATING HASH TABLES WITH TREES? THE ART-FUL RADIX TRIE __ Posted on November 3, 2018 | __ 16 minutes (3377 words) THE ADAPTIVE RADIX TREE: ARTFUL INDEXING FOR MAIN-MEMORY DATABASES _Leis et. al., ICDE 2013_ Tries are an unloved third data structure for building key-value stores and indexes, after search trees (like B-trees and

red-black trees

) and hash

tables. Yet they have a number of very appealing properties that make them worthy of consideration - for example, the height of a trie is independent of the number of keys it contains, and a trie requires no rebalancing when updated. Weighing against those advantages is the heavy memory cost that vanilla radix tries can incur, because each node contains a pointer for every possible value of the ‘next’ character in the key. With ASCII as an example, that’s 256 pointers for every node in the tree. But the astute reader will feel in their bones that this is naive - there must be more efficient ways to store a set of pointers, indexed by a fixed size set of keys (the trie’s alphabet). Indeed, there are - several of them, in fact, distinguished by the number of children the node _actually_ has, not just how many it might _potentially_

have.

This is where the _Adaptive Radix Tree_ (ART) comes in. In this breezy, easy-to-read paper, the authors show how to reduce the memory cost of a regular radix trie by _adapting_ the data structure used for each node to the number of children that it needs to store. In doing so they show, perhaps surprisingly, that the amount of space consumed by a single key can be bounded no matter how long the key is. OUTPERFORMING HASH-TABLES WITH MICA __ Posted on September 26, 2018 | __ 17 minutes (3532 words) MICA: A HOLISTIC APPROACH TO FAST IN-MEMORY KEY-VALUE STORAGE _Lim et. al., NSDI 2014_ In this installment we’re going to look at a system from NSDI 2014. MICA is another in-memory key-value store, but in contrast to Masstree it does not support range queries and in much of the paper it keeps a fixed working set by evicting old items, like a cache. Indeed, the closest comparison system that you might think of when reading about MICA for the first time is a humble… hash table. Is there still room for improvement over such a fundamental data structure? Read on and find out (including benchmarks!). MASSTREE: A CACHE-FRIENDLY MASHUP OF TRIES AND B-TREES __ Posted on September 10, 2018 | __ 8 minutes (1669 words) CACHE CRAFTINESS FOR FAST MULTICORE KEY-VALUE STORAGE _Mao et. al., EuroSys 2012_

THE BIG IDEA

Consider the problem of storing, in memory, millions of (key, value) pairs, where key is a variable-length string. If we just wanted to support point lookup, we’d use a hash table. But assuming we want to support range queries, some kind of tree structure is probably required. One candidate might be a traditional B+-tree. In such a B+-tree, the number of levels of the tree are kept small thanks to the fact that each node has a high fan-out. However, that means that a large number of keys are packed into a single node, and so there’s still a large number of key comparisons to perform when searching through the tree. This is further exacerbated by variable-length keys (e.g. strings), where the cost of key comparisons can be quite high. If the keys are really long they can each occupy multiple cache lines, and so comparing two of them can really mess up your cache locality. This paper proposes an efficient tree data structure that relies on splitting variable length keys into a variable number of fixed-length keys called _slices_. As you go down the tree, you compare the first slice of each key, then the second, then the third and so on, but each comparision has _constant cost_. For example, think about the string the quick brown fox jumps over the lazy dog. This string consists of the following 8-byte slices: the quic, k brown_, fox jump, s over t, he lazy_ and finally dog. To find a string in a tree, you can look for all strings that match the first slice first, and then look for the second slice only in strings that matched the first slice, and so on - only comparing a _fixed_ size subset of the key at any time. This is much more efficient than comparing long strings to one another over and over again. The trick is to design a structure that takes advantage of the cache benefits of doing these fixed-size comparisons, without losing a tradeoff based on the large cardinality of the slice ‘alphabet’. Enter the

_MASSTREE_.

EXACTLY-ONCE OR NOT, ATOMIC BROADCAST IS STILL IMPOSSIBLE IN KAFKA -

OR ANYWHERE

__ Posted on July 28, 2017 | __ 6 minutes (1072 words) _UPDATE: Jay responded on Twitter, which you can read here

._

INTRO

I read an article recently by Jay Kreps about a feature for delivering messages ‘exactly-once’ within the Kafka framework. Everyone’s excited, and for good reason. But there’s been a bit of a side story about what exactly ‘exactly-once’ means, and what Kafka can actually do. In the article, Jay identifies the safety and liveness properties of atomic broadcast as a pretty good definition for the set of properties that Kafka is going after with their new exactly-once feature, and then starts to address claims by naysayers that atomic broadcast is impossible. For this note, I’m _not_ going to address whether or not exactly-once is an implementation of atomic broadcast. I also believe that exactly-once is a powerful feature that’s been impressively realised by Confluent and the Kafka community; nothing here is a criticism of that effort or the feature itself. But the article makes some claims about impossibility that are, at best, a bit shaky - and, well, impossibility’s kind of my jam. Jay posted his article with a

tweet saying

he couldn’t ‘resist a good argument’. I’m responding in that

spirit.

In particular, the article makes the claim that atomic broadcast is ‘solvable’ (and later that consensus is as well…), which is wrong. What follows is why, and why that matters. I have since left the pub. So let’s begin. MAKE ANY ALGORITHM LOCK-FREE WITH THIS ONE CRAZY TRICK __ Posted on May 25, 2016 | __ 4 minutes (677 words) Lock-free algorithms often operate by having several versions of a data structure in use at one time. The general pattern is that you can prepare an update to a data structure, and then use a machine primitive to atomically install the update by changing a pointer. This means that all subsequent readers will follow the pointer to its new location - for example, to a new node in a linked-list - but this pattern can’t do anything about readers that have already followed the old pointer value, and are traversing the previous version of the

data structure.

DISTRIBUTED SYSTEMS THEORY FOR THE DISTRIBUTED SYSTEMS ENGINEER __ Posted on August 9, 2014 | __ 7 minutes (1325 words) _Updated June 2018 with content on atomic broadcast, gossip, chain replication and more_ Gwen Shapira, who at the time was an engineer at Cloudera and now is spreading the Kafka gospel, asked a question on Twitter that got me

thinking.

My response of old might have been “well, here’s the FLP paper, and here’s the Paxos paper, and here’s the Byzantine generals paper…”, and I’d have prescribed a laundry list of primary source material which would have taken at least six months to get through if you rushed. But I’ve come to thinking that recommending a ton of theoretical papers is often precisely the wrong way to go about learning distributed systems theory (unless you are in a PhD program). Papers are usually deep, usually complex, and require both serious study, and usually _significant experience_ to glean their important contributions and to place them in context. What good is requiring that level of expertise of engineers? And yet, unfortunately, there’s a paucity of good ‘bridge’ material that summarises, distills and contextualises the important results and ideas in distributed systems theory; particularly material that does so without condescending. Considering that gap lead me to another interesting question: _What distributed systems theory should a distributed systems engineer

know?_

A little theory is, in this case, not such a dangerous thing. So I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer. Let me know what you think I missed! THE ELEPHANT WAS A TROJAN HORSE: ON THE DEATH OF MAP-REDUCE AT GOOGLE __ Posted on June 25, 2014 | __ 4 minutes (706 words) _Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera_ MAP-REDUCE IS ON ITS WAY OUT. BUT WE SHOULDN’T MEASURE ITS IMPORTANCE IN THE NUMBER OF BYTES IT CRUNCHES, BUT THE FUNDAMENTAL SHIFT IN DATA PROCESSING ARCHITECTURES IT HELPED POPULARISE. This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data internally at all any more

.

We shouldn’t be surprised. The writing has been on the wall for Map-Reduce for some time. The truth is that Map-Reduce as a processing paradigm continues to be severely restrictive, and is no more than a subset of richer processing systems. PAPER NOTES: MEMC3, A BETTER MEMCACHED __ Posted on June 18, 2014 | __ 3 minutes (573 words) MEMC3: COMPACT AND CONCURRENT MEMCACHE WITH DUMBER CACHING AND SMARTER

HASHING

_Fan and Andersen, NSDI 2013

_

THE BIG IDEA

This is a paper about choosing your data structures and algorithms carefully. By paying careful attention to the workload and functional requirements, the authors reimplement memcached to achieve a) better concurrency and b) better space efficiency. Specifically, they introduce a variant of cuckoo hashing that is highly amenable to concurrent workloads, and integrate the venerable CLOCK cache eviction algorithm with the hash table for space-efficient approximate LRU. PAPER NOTES: ANTI-CACHING __ Posted on June 6, 2014 | __ 3 minutes (607 words) ANTI-CACHING: A NEW APPROACH TO DATABASE MANAGEMENT SYSTEM

ARCHITECTURE

_DeBrabant et. al., VLDB 2013_

THE BIG IDEA

Traditional databases typically rely on the OS page cache to bring hot tuples into memory and keep them there. This suffers from a number of

problems:

* No control over granularity of caching or eviction (so keeping a tuple in memory might keep all the tuples in its page as well, even though there’s not necessarily a usage correlation between them) * No control over when fetches are performed (fetches are typically slow, and transactions may hold onto locks or latches while the access

is being made)

* Duplication of resources - tuples can occupy both disk blocks and

memory pages.

* Older Posts →

*

Henry Robinson • 2020 • The Paper Trail Hugo v0.41 powered • Theme by Beautiful Jekyll adapted to Beautiful

Hugo

Details

Image Url

HTML Url

Moderation By

More Annotations

Ava Flores

2019-11-02 14:07:09

Ava Flores

2019-11-02 14:07:59

Ava Flores

2019-11-02 14:08:27

Ava Flores

2019-11-02 14:08:37

Ava Flores

2019-11-02 14:08:49

Ava Flores

2019-11-02 14:09:03

Ava Flores

2019-11-02 14:09:23

Ava Flores

2019-11-02 14:09:44

Ava Flores

2019-11-02 14:10:25

Ava Flores

2019-11-02 14:10:30

Ava Flores

2019-11-02 14:10:49

Ava Flores

2019-11-02 14:10:50

Favourite Annotations

Ava Flores

2020-01-19 09:32:13

Ava Flores

2020-01-19 09:32:21

Ava Flores

2020-01-19 09:32:52

Ava Flores

2020-01-19 09:33:19

Ava Flores

2020-01-19 09:33:31

Ava Flores

2020-01-19 09:33:39

Ava Flores

2020-01-19 09:33:52

Ava Flores

2020-01-19 09:34:03

Ava Flores

2020-01-19 09:34:11

Ava Flores

2020-01-19 09:34:21

Ava Flores

2020-01-19 09:34:37

Ava Flores

2020-01-19 09:34:55

Text

READING LIST

THE CAP FAQ

of disagreement.

ÉTALE COHOMOLOGY

post explains why.

READING LIST

THE CAP FAQ

of disagreement.

ÉTALE COHOMOLOGY

post explains why.

things up a

ON RAFT, BRIEFLY

ÉTALE COHOMOLOGY

COLUMNAR STORAGE

READING LIST

GRAY FAILURES

THE CAP FAQ

of disagreement.

THE-PAPER-TRAIL.ORG

post explains why.

READING LIST

GRAY FAILURES

THE CAP FAQ

of disagreement.

THE-PAPER-TRAIL.ORG

post explains why.

READING LIST

DIST SYS SLACK

invite.

*

*