Work With Kyle

  • Archive
  • RSS
  • Ask me anything

Sharding (continued)

A month ago I ended a post with a teaser about how we had to quickly scale up to handle MySpace’s traffic, so today I want to talk about the architectural changes that made that possible. In particular, I want to dive a little deeper into one of the fundamental questions we had to answer: how to horizontally partition (ie shard) our data. Broadly, there are two approaches to sharding…

Fixed Schemes:
A fixed partitioning strategy is one where a fixed characteristic of the item you’re sharding determines which partition it’s on, or, more concretely, it’s like an old school encyclopedia (the physical kind we had before Wikipedia). Those encyclopedias contained too much information for one book, so the articles were “partitioned” across multiple volumes - one book had all the As; another had the Bs and Cs, and so on. To find the right volume you used a fixed characteristic of the article, namely the first letter in the title.

There are many ways to incorporate fixed schemes into distributed systems. One naive approach is to partition users based on their last names, with all the “As” on one server, “Bs” on another, and so on. But this scheme has a couple drawbacks, the most important of which is that not all the partitions are same size (the “J” block is much bigger than the “X” one)*, and this disparity adds complexity. Fortunately, computer science has provided us with a tool for skirting exactly this obstacle - hash functions. To get an idea how hashing could help, consider the following simple, though imperfect, formula**:

node index = (hash(user name)) mod (number of nodes)

With a little hashing and modular arithmetic you can still used a fixed characteristic (the user name) to determine which node a user’s data is on, and you can also evenly distribute the data. Unfortunately, though this is a really powerful and widely used approach, it’s not perfect. Notably, whenever you add new nodes some of your old data has to be moved between nodes**. In some systems migration is manageable (see many of the NoSQL dbs), but with most relational databases, MySQL included, it’s a huge chore. So, here at Bunchball, to avoid the complexity of building and maintaining migration scripts, we turned to the alternative, dynamic schemes.

Dynamic Schemes:
In dynamic partitioning strategies the application cannot determine which node contains an item solely from the properties of the item itself. Instead, the app determines which shards contains the data by looking it up in an (often large) lookup table.

This scheme has several advantages, the obvious one being that any single piece of data can be moved to a new shard simply by updating the lookup table (and copying over the data). But although that flexibility is nice, it still leaves us with the same core problem as fixed schemes - copying MySQL data can be a pain. What we really want is to be able to add new nodes without moving data, and (surprise, surprise) with dynamic schemes we can do this. The trick is that we now control which shard new users are assigned to, so whenever a shard gets “full” we can stop assigning it new users. That way its current users never need to be re-assigned to a new shard.

At first glance, one might object that all the nodes won’t have the same number of users and/or the same amount of activity - the exact same problem we worried about with fixed schemes. But, what I left out in my discussions of fixed schemes is that though this makes life harder, with good preparation it’s not unmanageable***. To understand how this could work you first need to know two things about MySQL. First, you can put more than one logical MySQL database (ie shard) on a single physical server. Second, unlike moving individuals rows, migrating entire MySQL databases is relatively simple. Thus, using these two properties, we can maintain fairly equal database load by doing two things:

1. Keeping each of our logical databases small relative to the size our physical servers 

2. Mixing and matching databases appropriately.

One downside of using dynamic lookup tables is that they are a single point of failure and a potential bottleneck, but though these concerns are legitimate we ultimately decided that we could live with them because we thought that with memcached, database replication, and SSDs we could make it work in practice. Also, we knew that if at some point, our dynamic scheme didn’t scale, it would be pretty easy to build a fixed scheme on top of the lookup table. 

In practice, we’ve implemented a dynamic scheme that stores data for over 100M users over a couple hundred logical shards, and though we’ve spent some time moving shards around to balance out database server load and adding extra caching around our global lookup table, for the most part it’s just worked. Also, since each of our biggest customers can have their own lookup tables, we feel pretty comfortable we’ll be able to scale with the current system until, inevitably, Facebook becomes a customer.

Future topics: querying across shards, hashing rings, hybrid schemes

Notes:
* We will see solutions to this later in the article

**  If, for a given user, the hash mod (old # of nodes) doesn’t equal mod (new # of nodes) then their data has to be moved from the old node to the new node. No matter what you do, some data has to be moved. That said, techniques to reduce the amount of data moved have been developed, notably consistent hash rings (here’s a simple explanation), a common technique in NoSQL solutions, that I hope to explore in further detail in later entries.

For the mathematically inclined reader, you can show that if you add one new server at a time, that formula forces you to move almost all the data every time. In fact if the number of nodes is p, then only (1/(p + 1)) of the data stays on the same node; the rest must be moved. You can show this by proving the following two statements, which I leave to the intrepid reader (with a hint for #2 below)
1. Show that p and p + 1 are relatively prime for all integers, p
2. Prove that following two statements are equivalent (ie, one is true if and only if the other is)
   a. p and q are relatively prime
   b. x = (c % p) = (c % q) implies x = c % (pq)
3. Conclude that approximately (1 / (p + 1)) of the data stays on the same node

<
Hint: Use a consequence of the Chinese Remainder Theorem: if a and b are relatively prime then there exist integers x and y such that a * x + b * y = 1

*** Dynamic schemes present a slightly, but critically different problem from fixed schemes with regard to this issue. Also, from a didactic perspective, I still think equal size nodes are a good introduction into hashing schemes.

 

  • 4 months ago
  • 1
  • Permalink
  • Share
    Tweet

Reading

So far all the entries I’ve posted have been about the architectural and technological challenges we’ve had to hurdle, so today I want to change it up at bit and talk about something I think is a defining characteristic of our engineering team: a commitment to improvement. Now, clearly personal development is an enormous topic, so for this post I want to focus on an aspect I’m particularly passionate about: reading.

In today’s world, reading is often maligned. People wonder why they should spend their time reading a book when Google will answer most any question instantly. And to a certain degree they’re right. If you want to know the syntax for exception handling in Python, Google is your best bet, but, as I’ve come to appreciate more and more, reading isn’t just about learning facts. Great books change your perspective and expose you to ideas you didn’t know you didn’t know. The problem is that most books aren’t great and most technology books focus on answering how rather than why. So, I want to take a second to share some of the books that have had a major impact on the way I work:

Effective Java:

As far as I’m aware, this is the definitive book on Java best practices, but what makes it so special is that, though it focuses specifically on Java, it’s really a book about object oriented programming and programming languages in general. So, even if you’re not using Java every day, I would recommend giving it a look cause it spends a significant amount of time talking about why the language was designed the way it was - knowledge that will carry over to many software projects (as a side note, it does assume that you are already at least reasonably familiar with Java).

High Performance MySQL:

Like Effective Java, though this is a book about a specific technology, it really transcends that classification. It’s about how to think about databases and distributed computing in general (it assumes an understanding of basic SQL)

Code Complete 2:

This is a huge book, and the most comprehensive book on code construction I’ve seen. The recurring theme throughout the book is that software development is about “managing complexity”, a maxim went over my head in college, but I’m coming to realize is the best way to approach big software projects.

  • 4 months ago
  • 1
  • Permalink
  • Share
    Tweet

Sharding

I’ve spent the last two posts talking about some of the performance challenges we’ve faced over the years, so today I want to shift the focus a bit and talk about performance’s cousin, scalability*.  Scalability is a broad topic, so for this post I’m going to focus on scaling databases, a particularly acute challenge for many web applications.

To provide some background for the uninitiated: databases have to be “scaled” because, if you have enough users, no matter how much you optimize performance, at some point you won’t be able to process everything on one db server. You’re going to be forced to split your data. But, in dividing your data onto multiple servers you end up creating a new challenge for yourself - you have to be able to figure out which data is on which server. The solution: sharding**.

At Bunchball, we generally try to avoid premature optimization, but this is one case where we eschewed that maxim and planned for scale from almost the beginning. We’d heard horror stories about retro-fitting sharding solutions onto mature web apps and we knew it was an inevitable hurdle we’d have to face, so we built it into our architecture early on.

Broadly, there are two types of sharding, vertical and horizontal:

Vertical (Functional) Sharding:

Vertical sharding is the partitioning of databases by feature (“vertical”). What this means is that instead of having one server do everything, you break up responsibilities so that one server is in change of, say, leaderboards while another is in charge of virtual items, and so on. The primary virtue of this approach is that it’s simple. Your code is (hopefully) already broken up by feature, so you can just migrate your data to new servers, point different parts of the code to different databases, and voila, you have a sharding solution (your results may vary :) ).

Sadly, vertical sharding is not a panacea - it doesn’t scale forever. No matter how finely you partition functionality, someday, some component won’t fit on one server. So, despite its utility and simplicity, at some point you have to turn to the alternative, horizontal sharding (though the two are often used in conjunction).

At Bunchball, due to its limitations, we use vertical sharding for only a few things: in our platform we separate configuration data (each of our customers has a custom configuration) and user data. And, like almost every web service on the planet, we separate our data warehouse from our main platform. In these cases the different access patterns on the databases justified the division:
  •      Site configuration: frequent reads, infrequent writes
  •      User data: frequents reads and writes
  •      Data warehouse: rare bulks reads and writes
Horizontal Sharding:

Horizontal sharding partitions by user or customer instead of by feature. So, in theory, it’s infinitely scalable because as you add more users, you can just spin up additional servers, ad infinitum. But with the promise of infinite scalability comes additional complexity because unlike the hard-coded db connections from vertical sharding, horizontal sharding requires you to figure out which database to connect to dynamically, based on a partitioning key.

The first step in designing a horizontal sharding solution is figuring out what to partition on. Often the best choice is userId, but in our case, we actually first partitioned by customer website. This meant that each of our customers had their own database (though we co-locate many databases on a single physical server). Though this wasn’t a perfect solution because we couldn’t handle individual, super-high traffic sites, it did help us scale out quickly.

That all changed when we signed MySpace, a site that at the time had 50M+ monthly uniques. We knew there was no way we could handle that level of traffic without more granular sharding, but that story will have to wait until another post…


Notes:
* The difference between performance and scalability trips up many people. At some point I’ll write a post explaining my understanding of the distinction.
** Many projects in the NoSQL movements attempt to shield their users from this complexity by making sharding transparent… more on NoSQL later… though I do think, even given the success of NoSQL, everyone should understand the basics of sharding

  • 6 months ago
  • Permalink
  • Share
    Tweet

Queuing

When we left off last time I was describing our philosophy of performing pre-computation at write time to improve our read performance. At first we applied this approach only to leader boards, but as we added new features we continuously incorporated them into this framework, and it wasn’t long before we were doing quite a bit of work at write time - updating leader boards, news feeds, the data warehouse, etc… So, in solving our read response time problem, we were gradually creating a write response time problem. We needed to figure out a way to do the pre-computation without sacrificing write performance:

Parallelization:

One option was parallelization. Updating leader boards, adding to the news feed, etc… could all be done independently. So, we could modularize the work into a handful of self-contained “operations”, allowing us to realize significant performance improvements simply by kicking off a thread for each operation. This solution had a lot of nice properties: 

1. It would be relatively easy to implement 
2. It would reduce our response time to the response time of the longest operation. 

Still, we knew that parallelization is never a panacea, and we would have to balance its benefits with the downsides - the possibility of bizarre synchronization bugs and increased code complexity. Luckily, since we chose independent units of work and relied, primarily, on the underlying database for synchronization, we were able to sidestep most of those risks. So, after a little internal discussion, we decided to throw together a simple multi-threading implementation.

This solution reaped huge benefits, but it didn’t handle a couple scenarios very well, notably:

1. Traffic spikes would cause us to spin off a ton of threads, and could overwhelm our servers
2. If any of the individual operations failed, we had to roll them all back
3. Though we sidestepped most of the new synchronization issues, lock contention increased

So, we decided to complement parallelization with another classic software paradigm, queuing.

Queuing:

Queuing, as I’m sure you know, is a simple concept: instead of doing work immediately you put in a queue and do it “when you can”. And though it isn’t applicable in all situations, it’s widely used for both time-insensitive and bulk operations.

There are a fair number of open source queuing solutions available, but after evaluating the options, we decided to build our own in house because we we wanted something simple that we had complete control over. We’re currently improving our model to make it more robust and flexible so I’ll hold off on the details until we’ve finished, but for now it suffices to say that I view it as probably the most important component in our back-end architecture - the paradigm that powers many of our most complex features.

  • 7 months ago
  • Permalink
  • Share
    Tweet

Summary Tables

I want to kick-off the blog with a series of posts detailing some of the challenges the Bunchball engineering team has faced over the past few years. I hope they’ll provide insight into the way we think about architecture, performance, reliability, etc… This first post will likely be remedial for those of you who’ve built scalable websites, but even if you’re an expert I’d still recommend skimming through the post cause it’ll frame the rest of the series. Now, on to the story…

When our back-end was young, life was simple.  When a user watched a video, we just logged the activity and rewarded the user’s good behavior with some points. When a user wanted to see a leader board with the site’s ten most frequent videos viewers, we just ran a simple SQL query (modified for readability):

select user_id, count(*) from video_watches group by user_id order by count(*) desc limit 10

Unfortunately, though simple is nice, in this case, it didn’t scale.  As our user base and the number of videos they watched grew, the query took longer and longer. We needed a solution where the query time didn’t increase with the number of users. We brainstormed a couple options: 

One was the reflex response to any database performance issue: to cover it up with a caching layer. The logic being that if you run a query only once every fifteen minutes then it isn’t so bad if it takes thirty seconds each time.  We evaluated this approach and agreed it would solve the immediate pain point, but would also harm the user experience because the user’s scores wouldn’t show up in the leader board immediately.  We also realized that even if we compromised and accepted this as a necessary evil, at some point we were still going to have to address the underlying problem.  We couldn’t just let the query steadily take longer and longer.  Taking all that into account, it wasn’t a tough decision to eschew the caching solution and implement our other idea, “summary tables”, instead.  

Summary Tables:

At their core, leader board summary tables are simple.  They’re just a database table that stores a user’s score in the leader board:

Schema: (leaderboard_id, user_id, score), indexes: (user_id), (score)

Their implementation is also straight forward: every time a user watches a video we update their score in the table.  Summary tables have a lot of nice properties, most notably, their “score” index allows us to get a list of the top users quite quickly. And though they cause extra overhead from the extra db writes needed to keep them up-to-date, compared to caching or running huge select queries, they’re more than worth it.

So, I’m sure a lot of you are wondering why I began this series with summary tables.  They’re certainly not a revolutionary idea.  The answer is that I think they clearly illustrate our philosophy of doing extra computation at write time (ie when the user watches the video) to improve performance at read time (ie when the user looks at the leader board).  Now, that’s hardly a unique philosophy (it’s shared by most teams building web services), but I think it’s a critical one that greatly informs our decision making.  

Also, the motivation behind summary tables leads nicely into my next topic…

  • 7 months ago
  • Permalink
  • Share
    Tweet

Why the Blog…

Now that you know a bit about me, let me take a second to talk about Bunchball and this blog.

Bunchball is a gamification (more to come on what this means) startup in the bay area.  We’ve built a web platform and suite of widgets that help our customers incorporate game dynamics into their digital experiences.  We have a team of six great engineers, but with surging interest in gamification we need to continue bringing on top talent.  The challenge, as anyone who’s worked at a startup before knows, is that finding amazing people is difficult, so we’ve decided to start this blog to educate the world about engineering life at Bunchball and, hopefully, convince our readers that they want to join us on our journey.

Recruiting:

Simplifying to the essential, I believe that if you want to recruit successfully as a startup you must do two things:

1.  You must expand awareness.  You’ve got to get top candidates to send you their resumes

2.  Once you’ve got the attention of great people, you’ve got to sell them on the position.  A non-exhaustive list of what they might be looking for includes:

     a. Interesting, challenging problems
     b. Other great people
     c. A hot industry
     d. A company that strives to make the world a better place


Not surprisingly, I think Bunchball embodies all these characteristics, and I hope to use this blog to convey that.

Content:

Towards this end, I’m planning on writing about the interesting and challenging problems I face.  From ten thousand feet, my responsibilities can be divided into two groups:

   1. Improving our back end platform 
   2. Developing our analytics offering

Most of what I write about will fall under those two umbrellas - topics ranging from performance, scalability, and code architecture to data warehouses and data mining. That said, I don’t want this to be a carbon copy of every other engineering blog, so I’ll also occasionally veer off course and ramble about random topics for no reason other than that I think they’re fascinating.

  • 8 months ago
  • Permalink
  • Share
    Tweet

On Me

Welcome to my blog!  Seeing as this is my inaugural post, I figure I should take a second to tell the story of how I became a software developer at Bunchball…


When I began my freshman year of college I didn’t know what I wanted to do with my life.  I only knew that my interests leaned towards math, so I spent my first two years taking all the prescribed introductory physics, math, computer science, and economics classes, but nothing captured my imagination.  I did love CS theory (I’ve always been a sucker for abstract mathematics), but couldn’t bring myself to declare the major cause I found the coding tedious and worried about becoming the stereotypical “code monkey”. 

It wasn’t until the end of my sophomore year, when I needed to decide on some industry for my summer job, that I was forced to take a stand.  So, I flipped through some job postings and stumbled across an opening at a small startup, Robot Genius, and decided why not.  It’s only a summer, it couldn’t hurt to give it a shot… And To make a long story short, it turned out to be the best summer of my tlife. Beyond all the fun I had, I learned three lessons that fundamentally changed the way I looked at my career:


1. Building software can be a lot of fun

2. A really good engineer isn’t a code monkey.

3. Working with a talented, motivated team can be extremely rewarding.

The next summer I worked at Robot Genius, again, and loved it, again.  Still, I wasn’t 100% certain that startups were the place for me.  Maybe I had just gotten really lucky and found a great one.  Maybe working for a big company would be even better.  So, after college, I tried something completely different and moved to Redmond to work for Microsoft on the Windows file system.  And well… I don’t want to give the impression that working for Microsoft is terrible - it certainly isn’t - but the passion and the fire just weren’t there.  Most Microsoft employees get pigeonholed into one small part of a much larger system and don’t get a chance to do the experimentation and exploration that happens every day at a startup.  So, after a year, I decided it was time to return to the bay area and get back to what I knew I loved, building software at a startup.


That brings us to three years ago when I started working at Bunchball… 

  • 8 months ago
  • Permalink
  • Share
    Tweet

About

My name is Kyle Vigen, and I lead back end development at Bunchball, a gamification startup. I started this blog to share the cool stuff I work on with the world and to try to convince other talented engineers that they would love working with me and my co-workers at Bunchball.

I have a BS in computer science from Stanford, and in my free time I enjoy playing basketball, reading anything I can get my hands on, and talking about how I'm going to start training for a triathlon.

Please check out our careers page: http://bunchball.com/careers


Also check out:
http://WorkWithKasey.com
http://WorkWithMolly.com
http://WorkForSteve.com
  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr