Networking in the Cloud Fundamentals, Part 5

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.TranscriptCorey: As the world spins faster, it heats up because of friction. Therefore, for the good of humanity, the AWS Global Accelerator must be turned off. Welcome once again to Networking in the Cloud, a 12 week special on the AWS Morning Brief, sponsored by ThousandEyes. Think of ThousandEyes as the Google Maps of the internet without the creepy privacy implications. Just like you wouldn't necessarily go from one place to another without checking which route was less congested during rush hour, businesses rely on ThousandEyes to see the end to end paths that their applications and services are taking, from their servers, to their end users, or between other servers, just to identify where the slow downs are, where the pile ups live, and what's causing various issues. They use ThousandEyes to see what's breaking where and then of course depend upon ThousandEyes to share that data directly with the offending providers, to shame them into accountability and get them to fix the issue. Learn more at thousandeyes.com.So, today we talk about the Global Accelerator, which is an offering from AWS that they announced at re:Invent last year. What is it? Well, when traffic passes through the internet from your computer on route to a cloud provider, or from your data center to a cloud provider, the provider has choices as to how to route that traffic in. Remember, there's no cloud provider that we're going to be talking about that doesn't have a global presence. So, they have a number of different choices.Some, such as GCP and Azure, will route that traffic directly into their networks right away, as close to the end user as possible. Others, like AWS and interestingly Alibaba, will have that traffic ride the public internet as long as possible, until it gets to the region that that traffic is aimed at, and then ingested into the provider's network. And, IBM has an interesting hybrid approach between the two of these that doesn't actually matter, because it's IBM Cloud.Now, Global Accelerator offers a slightly different option here. Because by default, traffic bound to AWS will ride the public internet until it hits the region at the end. That means that traffic is subject to latency based upon public internet congestion. It's subject to non-deterministic latency, as far as leading to... Some packets will get there faster than others, as they take different routes, so jitter becomes a concern.Global Accelerator sort of flips the behavior on its head, where instead of traveling across the entire internet until it smacks into a region, traffic now winds up landing into AWS's network far sooner, and then rides along AWS's backbone to where it needs to go. And then, it smacks into one of a number of different end points. Today, at the time of this recording, it supports application load balancers, either internal or external, network load balancers, elastic IPs and whatever you can tie those to, and of course EC2 instances, public or private. We'll mention that... The caveat about that a little later on.On the other side, to the internet, what happens is that Global Accelerator gives out two IP addresses that are Anycast. What that means is using BGP, those are generally repointed to the closest supported region to the customer. As a result, they can do a lot of changes to network architecture in completely invisible ways to the end user. It supports, for example, shifting traffic to different regions or endpoints. It can shape how that traffic winds up manifesting on the fly.So, other ways of managing this such as using DNS, means that suddenly you don't have high TTLs anymore on the client side. That mean the traffic doesn't shift as closely as you'd like, and IP caching as well once that DNS record is resolved, no longer applies. You see this all over the place with, for example, public DNS resolvers. The same IP addresses are what people use globally to talk to, well known DNS resolvers, but strangely it's always super quick and not traveling across the entire internet. Imagine that.This is similar in some ways to AWS's CloudFront service. CloudFront is, as mentioned, a CDN that has somewhat similar performance characteristics. It generally winds up being a slightly better answer when you're using a protocol like HTTP or HTTPS that the entire CDN service has been designed around. They have a whole bunch of locations that are scattered across the globe, and sure it takes a year and a day to update a distribution or deploy a new one in CloudFront, but that's not really the point of this comparison here.Where Global Accelerator shines, is where you have non HTTP traffic, or you need that super responsive failover behavior. You have a lot more control with Global Accelerator as well. So if for example, data processing location is super important for you due to regulatory requirements, it's definitely worth highlighting that Global Accelerator does grant additional flexibility here. But it's not all sunshine and roses.There are some performance metrics that shine interesting lights on this. Where do those performance metrics come from, you might wonder? Well, I'm glad you asked. They come from the ThousandEyes state of the cloud performance benchmark report. As mentioned previously, they wound up doing a whole series of tests across a whole variety of different cloud providers from different networks, that in turn wind up showcasing where certain cloud providers shine, where certain cloud providers don't necessarily work as well in some context as others do, and more or less, for lack of a better term, let you race the clouds. It's one of the fun things that they're able to do because they serve the role of global observer. They have a whole bunch of locations where they can monitor from, and they see customer traffic so they understand what those use cases look like in real life.Feel free to get your copy of the report today. They race, GCP, Azure, AWS, Alibaba, and IBM Cloud. As mentioned on previous episodes, Oracle Cloud was not included because they use real clouds. You get your copy today at snark.cloud/realclouds, that's snark.cloud/realclouds and thanks again to ThousandEyes for their continuing support of this ridiculous mini series. Now, what did ThousandEyes learn? Well, this should be blindingly obvious, but in case it's not, the Global Accelerator is not super useful if you and your customers aren't far apart.An example that came up in the report was that if you're in North America, which by and large has decent internet connectivity provided you're not somewhere rural due to a variety of terrible things, we'll get to in a future episode, then it's not going to be super useful for you. You're generally, as far as the internet is concerned, relatively close to an awful lot of AWS regions in North America. We're talking tens of milliseconds in most cases.So if your customers are right next to an AWS region, then you're not really going to see a whole lot of benefit from a tool like the AWS Global Accelerator. Now, not everyone lives in San Francisco, it turns out. So, if you have users, customers, et cetera, scattered around the world in far flung places, then it turns out that something like the Global Accelerator can absolutely add some benefits.It has the ability to meaningfully change some of the latency and consistency metrics more effectively the further out into the world, and across the unstable internet, customers are from your regions. Now in a couple of edge cases, and this is contested of course, but notably one ISP in India, the Global Accelerator performed actively worse than the general internet did in a series of tests. There is some nuance to this and I understand why people are saying, wow, hold on there, but the methodology is largely sound.There's always going to be concerns with various networks and how they peer with other networks. In practice though, there really is only one solid takeaway from this. And that is that, if you're going to be using the AWS Global Accelerator for actual customers rather than Black-Box benchmarking, that you really don't want to tell the provider that you're doing in advance, then you're going to want to be sure that you reach out to AWS to let them know what you're up to before you turn it on. They do have knobs and dials on their side that they can adjust to control things, and of course figure out what their actual customers are up to. Most cloud providers worth talking to, tend to optimize for customer satisfaction not benchmark satisfaction. So that said, as mentioned, it's not all sunshine and roses here.So when is the AWS Global Accelerator not going to work out super well? Let's talk about some caveats. For one, and this is an edge case, but it is worth highlighting. As I mentioned earlier, the Global Accelerator can be used to talk from the internet to an EC2 instance that lives in a private subnet, provided there's an internet gateway hooked up to that VPC. Now that's a big deal because almost everyone's security policies assume that that is not a situation that's ever going to happen. Well, welcome to reality because that just changed.If you're deploying Global Accelerator, make very sure that your security policies align with that. It's also worth pointing out that from the time that they ran these tests a month and a half ago or so to now, there have been significant regional availabilities announced for Global Accelerator. It's always a moving target trying to do any kind of review, or analysis of an AWS offering, particularly something as broad as a globally distributed networking approach, but it's worth at least paying attention to that they are evolving rapidly. So, understand that limitations today are possibly not going to apply tomorrow.Pay attention. The things that we learn and know for a fact about computers or anything really, we don't tend to go back and reevaluate those later in life. Once we learn something, we stop reevaluating it. And we all fall victim to that. AWS does not hold still for better or worse. One of my personal pet peeves about this, is that pricing is non-deterministic. Now, what do I mean by that? How much Global Accelerator is going to cost you? Well, first there's a fixed fee per hour that it runs. Fine, whatever. Great. We're used to that. In this case, virtually no one cares because it's two and a half cents an hour. That's not what I'm complaining about or particularly concerned by. The problem is that there's now a data transfer premium fee on a per gigabyte basis that is transferred over the AWS network.Now, how is that determined? Glad you asked. You're not going to like the answer. The DT premium rate depends on the AWS or region that serves the request, and the AWS edge location where the responses are directed. You're only going to be charged that premium fee in and the dominant data transfer direction, but note that fee is on top of the existing data transfer pricing as well. The pricing at retail rate goes as low as one and a half cents per gigabyte, but in some regions and between others, you can see things that are approaching almost seven cents a piece, eight cents a piece in a couple of them, and 10 and a half cents in others, where there is significant costs to driving this.But my problem isn't that it's expensive, and my problem isn't that this pricing is inherently unfair. My concern is that it is effectively impossible to learn what this is going to cost you for any reasonable estimation, until you test it on your own traffic patterns and find out. This is one of the big problems I have as a cloud economist whenever I make fun of the cloud. It's almost impossible to figure things out in advance. Pricing wise, test it, see it, if it's that big of a surprise, cry, beg for mercy, and hope you get a refund. Other caveats include, that CloudFormation is not super well supportive of Global Accelerator even at the time of this recording, because why would it be? You're not allowed to talk to each other if you work on different AWS service teams.And of course, my last real caveat for this, and it's just an annoyance, but as I mentioned at the beginning of this episode, you get some very similar behavior from GCP and Azure for free by using GCP or Azure. AWS is charging you a premium for this because AWS... And in return for that, I would expect to see significant functionality delivered as a result. For an awful lot of use cases, I don't think it's quite there yet. For your use case, it might be. So don't take this as a condemnation of the service, take it as... Or with almost anything else AWS or other providers release, investigate further.So, the takeaway here fundamentally is that your results are going to vary wildly. And the variables aren't lengthy. What regions your in, what networks your customers are coming from, what your traffic looks like, that's going to drive the cost. And the only way you're going to get answers for this, is to test it, and see how it performs for your use case.Now, the Global Accelerator is not a panacea but it very well could help with some specific use cases. It might also cost a King's ransom and in a few edge cases make things actively worse, but that's why we test, and that's why we talk to AWS when we're doing things like this. That said, it's worth keeping an eye on the Global Accelerator as it continues to evolve. To learn more about the product, you can type AWS Global Accelerator into your search engine of choice, and then go slowly mad with frustration as it takes forever to return a result because of the slow internet.This has been another episode of what I'm calling networking in the cloud. Thanks again to ThousandEyes for making it possible. We will not be having a Networking in the Cloud episode next week, because we will be busy with other things in the wake of the disaster that is re:Invent. If you're looking for more content, there are plenty of other places to go, just not here for one week. I'm cloud economist Corey Quinn. Thank you for listening, and I will talk to you in two weeks.Announcer: This has been a HumblePod production. Stay humble.

2356 232