“Any sufficiently advanced technology is indistinguishable from magic” — Arthur C. Clarke
Some times I find myself so used to the world of Cloud technology I forget there was ever any other way of doing things. It’s not that I take it for granted I just — actually no, that’s pretty on point I definitely take it for granted.
So when I came across Shuffle sharding I was reminded of the pretty cool stuff going on under the services we use daily.
Shuffle sharding — to the uninitiated, which was me but now it isn’t so I can hold it over you, is a way of minimising the blast radius of DDoS attacks. This became a critical issue when AWS were looking at hosting customer DNS records so that their customers could use particular AWS services at the root of their domain. Services like S3, CloudFront etc.
The root domain, zone apex or naked domain name is essentially the web address without the www part. When you type in “www.iamawesome.com" (which is a legit website, no affiliation), your web browser has no clue where that web page is — unless you’ve been there recently and it is still in your cache. But, assuming this is your first time, your browser reaches out to its local mate the DNS server to enquire about the top level domain: com.
URLs are actually written the opposite way around to how we would naturally expect them to be resolved. Even though www appears to be first, it is only a subdomain of the root — iamawesome.com. Starting at the top level domain, our resolver looks for the address to another server that has the entries for domains that have com as their top level. Then it navigates to that server and looks for entries for the domain — iamawesome. You could stop there but we didn’t ask for iamawesome.com, no we’re annoingly specific, we wanted www.iamawesome.com. So our resolver makes one last trip looking for the sub domain of www for the root domain of iamawesome.com. And finally that will give you the IP address that your browser understands and can navigate to.
Think of your file directory, it would go like: /root/user/Hamzah/MyDocuments.
Well, if this was a URL it’d probably go something like: MyDocuments/Hamzah/user/root/
And to get to where you want to go you start at root, which in the URL is on the other side.
Someone asked me to tell them a URL joke, I said I’m sorry, that’s not really my domain.
Distributed Denial of Service attacks are generally from a small group in control of a large army of botnets that are likely operating from infected hosts. You could be a host and not know it, and your computer could be participating on the DDoS attack you’re probably reading about on the news!
There are three main types of DDos attacks:
- Application Layer — making simple non-malicious requests that are computationally expensive for the server
- Protocol Layer — spoofing IP’s to keep connections open between the fake client and real server
- Volumetric attacks — this one is my favourite, you don’t need as many botnets but this is ideal for taking down large companies — when I say favourite I mean I am impressed by the method not the application 😆
Essentially all three boil down to overloading the server enough so that it impacts service as the legitimate customers must compete with the bots making fake requests leading to degradation of service. Or overloading the server to the point it comes crashing down 💥 (accurate representation of a server coming down)
Going back to Shuffle sharding, because what is life without it? With AWS now offering DNS hosting to their customers they had to ensure failures for one of their customers didn’t affect or only affected a very small part of their other customers. And this meant dealing with a quite common occurrence: DDoS attacks.
The big problem statement here was if a customers domain was attacked by an outsider, or even a rogue poisonous requests from the customer themselves, how could they limit the scope of this attack.
For starters, let’s look at basic horizontal scaling to deal with the increased load. This is the most basic way to deal with constraints to your resources… provision more of them — it is the cloud after-all. But whilst it is the most basic it is also the most expensive option. A single server could cost a business a lot of cash-ola but to add more bots costs the attacker nigh on a few pennies. So unless your business model involves reporting quarterly losses this is a lose-lose.
Still a very popular architecture amongst businesses, this is a perfectly normal and fine way of dealing with load using auto scaling, health checks and load balancers.
Below we have a set of instances, fronted by an imaginary Load Balancer, that sends customer requests evenly across the instances, again the most basic approach, works fine in normal settings and no biggie.
Let’s now introduce a biggie… or a bug… a buggie? No that doesn’t sound right.
Say one of the customers is sending a poisonous request that is triggering a bug and even taking down resources. Will they realise this and just stop? Probably not. More likely their application will retry… and retry…. and retry. Leading to our first instance going down from an unintentional internal DDoS attack.
But it’s ok, no impact really one instance goes down and the Load Balancer does what it’s gotta do and begins routing the requests through to the remaining healthy instances… routing new requests but also the poison request. A chain reaction ensues and now we’ve lost all of our remaining instances and the outage has impacted every customer. The biggest fear for anyone trying to implement a multi-tenant system, one customer taking down everyone else. A blast radius of everyone.
Our next option seems like a pretty simple solve, how do you limit the blast radius? Simple Sharding could do the trick! You split the 8 nodes into shards or groups and each groups traffic is isolated from the next.
Customers are assigned to a shard, in our case we can say two customers per shard, with 2 worker nodes servicing the requests. So should a shard be taken down due to an attack, at most we will lose 2 customers as apposed to 8. That is a 25% impact, markedly better than a 100%!
The impact from using Sharding is lessened as the more instances you have the lesser the impact.
This is great and you could add monitoring to this that will detect the problems and isolate them from the rest of the customer base, so they don’t impact anyone else. But that still leaves a bit to be desired, the monitoring approach is great at mitigation after the effect, what if we can lessen the impact itself even further?
This is where we turn towards Shuffle sharding. Similar to shuffling a deck of cards, customers are shuffled and randomly assigned (well they are using hash functions to assign them so kind of randomly) to a shard. Only this time it isn’t the regular, ordinary, run of the mill shard we know and tolerate, this is a virtual shard. Customers are, in this example, assigned to virtual shards and two worker nodes, each of the worker nodes shared with one other customer.
That means customers in the same Shuffle shard can share 1 worker node with one customer and share another with a completely different customer.
Below we see Customer 1 (C1) sharing a shard with Customer 8 in one shard and Customer 7 in another shard.
Let’s replay our favourite scenario of a poisonous request and this time we’ll say Customer 1 is being naughty and keeps retrying and taking down it’s worker nodes. By doing this we end up losing both worker nodes in the Customer 1 shard, thereby impacting service for Customer 1.
This means we won’t fully impact any other shard or customer. And in the worst case scenario, we will interrupt shards that shared worker nodes that were taken down (Customer 7 & 8 in this case). But assuming that they are fault tolerant and retry the request they will be served by the up and running healthy worker nodes in instances 3 & 4 so it’s all cushty 👍
Using this method AWS have managed to bring the blast radius of an attack on a customer on a multi-tenant system down to the customer itself. A pretty menial victory if you are your own customer running your own services, but an incredible feat to pull off on a multi-tenant model. Every cloud providers dream of operating a multi-tenant model so well that to their consumer thinks they’re single tenant.
In our example, we’re looking at 8 customers sharing 8 workers nodes, with 2 shared instances per virtual Shuffle shard — the 8 instances are shared across the virtual Shuffle shards. This means that our small example gives us a possible 56 potential Shuffle shard combinations.
And when you remember back to the customers sharing worker nodes with the impacted customer (C7 & C8 in the diagram above sharing worker nodes with the impacted C1) you see that the impact is constrained to 1/56th of the customer population (as long as retry logic is applied and the clients aren’t shy to retry against the shard until it finds a healthy endpoint).
Much improved on the 1/4 we had for our Simple Sharding solution.
Some real world math
In the real world you can go further since we can use more instances, you can shard to as many as you would like, well as many as your client can retry to. Let’s assume we have a client that can do up to 3 retires, and a shard with 4 worker nodes, the numbers start to look even better.
Now if you have a potential customer impact taking down workers nodes associated with that customer, you will only be affecting 1/1680th of your client base. That’s a nice 0.0595%.
And the best part? As you’d expect in most systems that grow you struggle to scale, with Shuffle Sharding the more customers and workers you have the better the numbers.
Route 53 uses 2048 virtual name servers — so it’s easier to move them around for capacity reasons between physical servers. And every customer domain is assigned to a Shuffle shard of four virtual name servers giving them a possible 730 billion Shuffle shards. What the shard… that’s a lot!
That’s enough Shuffle shards to assign every customer domain a unique Shuffle shard.
The Route Infirma Library
Amazon have released a library called The Route Infirma library and open sourced it so others can make use of their findings. In here you will find 2 types of Shuffle Sharding: Stateless and Stateful.
In Stateless Shuffle sharding, hashing is used to create a placement pattern. With this method there is still some possibility of overlap between shards but it is easy to use.
In Stateful Shuffle sharding, again it is random assignment BUT it has the added ability to check the newly created shard against previously generated ones to ensure the state of overlap. So you can choose if you want no more than two shuffle shards to share 2 particular endpoints. Go crazy!
Note that partial overlap is good as it allows for an exponential increase in the number of shards a system can support.
And finally, both methods are what is called compartmentalisation aware — a fancy way of saying if you wanted to, for example, ensure your shards were Availability Zone resilient, you could split up your worker nodes between AZ’s. A little bit here and a little bit there. That way if one AZ decides it no longer wants to work, you still have worker nodes in another AZ ready to serve traffic 👌
- Shuffle sharding can be used for many different types of resources, including servers, queues, storage etc.
- the clients need to be fault tolerant and able to retry requests
- there needs to be a routing mechanism — so you either give each customer a DNS name (like AWS do for S3, CloudFront, Route53) or you will need a content-aware router that is capable of doing Shuffle Sharding
- Fixed assignments in route53 — you’re stuck with the hand your dealt . If your nodes fail you won’t be moved to another shard as part of the system as that takes us back to the problem infecting a whole system
And don’t worry, as much as I’ve discussed isolating failures to the single customer, that doesn’t mean AWS just sit there like wow buddy, tough being you huh? 🤔 Customers experiencing DDoS attacks can be assigned dedicated attack capacity after being re-sharded into their own single isolated shard, make use of the AWS Shield capabilities for data scrubbing against attacks and more. It’s as simple as Uno, DDoS, Tres.