62. Practical Istio (A Dockercon 2019 Recap)
Jon Christensen and Chris Hickman of Kelsus and Rich Staats of Secret Stache recap Zack Butcher’s DockerCon 2019 session titled, Practical Istio.
Some of the highlights of the show include:
- Scalability Spectrum: Level of complexity that requires service mesh or Istio
- Istio: What it’s all about and whether it makes sense for your business
- Service mesh is communication backbone for software components; typically used for microservice and container architectures
- Service mesh offers traffic management, security, and observability
- Istio: Quick acceptance, adoption, and growth of open source, platform-agnostic project
- Difference between proof of concept, evaluation, small-scale vs. viable necessity and in production implementation
- Five main components of Istio’s architecture:
- Security is a multi-layer cake; emphasis is put on routing and security, but doesn’t mean it’s secure
- Telemetry: Any kind of data point going on in your system
- Demos of deploying and integrating Istio with applications to meet requirements
- Successful Adoption Traits: Focus on single pain point, create group of champions, invest time in learning, only expose subset of configurations, and be in it for the long haul
Links and Resources
Just what is a “service mesh,” and if I get one, will it make everything OK?
Rich: In episode 62 of Mobycast, we recap another DockerCon 2019 session. This one focuses on Practical Istio by Zack Butcher. Welcome to Mobycast, a weekly conversation about cloud-native development, AWS, and building distributed systems. Let’s jump right in.
Jon: Welcome, Chris and Rich. It’s another episode of Mobycast.
Rich: Hey, guys. It’s good to be back.
Jon: Rich, we’ve missed you for a couple of weeks. What have you been up to?
Rich: We are in full on hiring mode. We’re hiring an account manager from the agency world. He’ll be part time, hopefully, full time eventually. At least one but probably two more developers. It’s been a crazy sort of a rolling for me. I’m not really sure what I’m doing. I just feel like I’m bouncing around from thing to thing just trying to keep everything afloat. It’s been a challenging couple of weeks.
Jon: Well, if you’re listening and if you want to work with Rich, which I think would be a good idea, even though he says he doesn’t know what he’s doing. You know where to reach us. You can just go to mobycast.fm and you can get through us that way. How about you, Chris? We talked last week. Anything new?
Chris: Yeah. After really busy Spring travelling, it’s kind of nice that I got a bit of a breather here. I got to stay at the office for a few weeks before the next trip, so just hunkering down and getting some stuff done.
Jon: Nice. We have been talking on a show quite a bit about how I got back into development, did this and that with serverless and AWS. Now, I’ve switched from building that perfect concept to do some wireframing based on that perfect concept. I’ve got to tell you, for me, I don’t honestly know if one thing is more difficult than the other but for me, wireframing is harder because I just can’t stay focused on it. I can get like 10 good minutes out of myself and then I’m finding myself on Twitter. People that are good at wireframing and can do it for eight hours straight, wow. I’m impressed. But it’s fun. It’s still fun. I’ve forced myself back into, I’m using Adobe XD which is interesting. But I think I like Sketch better but I wanted to use something that wasn’t tied to the dying world of Mac. Just throwing in a little controversial statement.
Rich: I’m not going to bit.
Jon: Last week, we talked about service meshes. I argued and argued with Chris because I was just trying to ge my head around it. Sometimes that’s how I get my head around things, it’s by questioning their very existence. This week we’re going to stay on the topic of service meshes and get more specifically into Istio which we just kind of hinted at last week. I think, this comes from another talk from DockerCon 2019. Would you mind saying who this is from, Chris?
Chris: Yes. This was a talk titled Practical Istio by Zack Butcher who is a founding engineer at Tetrate.
Jon: Cool. This is kind of a departure for us. As we’ve mentioned last week, I don’t really see us getting a type of project at Kelsus that would need a service mesh. I don’t even know that we would want it because we kind of have a sweet spot in the middle to lower scalability spectrum. We’re not trying to get to 10, 20, 30, or 100 million users for the projects that we’re doing. I think you actually even could without Istio or without a service mesh. We’re not just building stuff at the level of complexity that we need a service mesh at this point and we’re not trying to because it would be hard for us to scale the company.
Even though we’re not trying to do this, it’s a topic that a lot of people are interested in. I think, just as developers, it’s really interesting to know about what the super high scale companies contend with and what they work with. Then, I think for a lot of listeners, they might think that they’re missing out by not using a service mesh. It’s part of the DevOps journey to go to microservices, infrastructure its code, and then eventually, a service mesh. Let’s talk about what Istio is all about. Then, people can make the decision for themselves whether it makes sense for their organizations. How about an overview, Chris?
Chris: Yeah. With that, it’s good to revisit a bit from the last episode and just what a service mesh is and the benefits of it. Again, TLDR, it’s a communication back plan for your services. […] it didn’t have to necessarily be container-specific or microservice-specific. But for the most part, it definitely is associated with microservice architectures and container architectures just because it’s value increases with the number of moving parts that you have in your system.
Again, the three main things of the service mesh are centered around traffic management, security, and observability. Keeping those things in mind as we go and talk about now an actual implementation of a service mesh which is Istio. Istio is an open source project. It’s platform agnostic. It first hit the scene, the version 0.1, if you will, came out in 2017. An entire two years ago.
Jon: Quick interruption, Chris. It was announced that GlueCon which I’ll be attending. I’ll be at GlueCon while you, dear listeners, are listening to this. I will be at GlueCon next week but you’ll be listening to it while I’m there. Little shout out to Gluecon, one of my favorite conferences.
Chris: Cool. After a year at 0.1, 1.0 the following year. Very quick adoption in the community. Lots of […] around it. A lot of people were attracted to this. We talked about the three main benefits: the traffic management, the security, and just dealing with observability as well. All those really resonate with folks in microservice architecture. I think there is a lot of activity here. A lot want to contribute to it. Very quick growth. It is optimal. Istion is definitely optimized for Kubernetes. It has grown in complexity as it evolved.
Jon: Yeah. Sorry to interrupt, Chris. Couple things you just mentioned there just struck me because as we’re starting here and saying, “Oh, we probably won’t do this.” You can build systems without these that support many, many users. It’s interesting just that how quickly it’s been adopted and how it goes hand-in-hand with Kubernetes. I don’t know what that means but my guess is that it either means, “Hey, there are a lot of big companies with really complex systems that were just needing some solution to this because they just have people building the security, observability, and other pieces of their architecture over and over again. They’re just tired of it and they need one place to manage all that.”
This is a little more controversial, there’s dev ops, and operations team with time on their hands, and desire to build something cool and big, and they’re doing it. They’re just like, “Hey, this is where we’re going whether we really need it or not.” Maybe that’s happening. I’m sure it’s happening at least a little bit but my goodness. For something as big as this, you get such quick adoption. It’s kind of interesting to think about why.
Chris: We’ve kind of touched on this before. At the end of the day, developers in general are attracted by shiny things. Some things new and hot are definitely something to go look at, evaluate, and start using and whatnot.
There’s definitely a difference between doing a proof of concept, or evaluation, or small scale implementation type thing versus this is something that you need and it’s in production and really being used 100%. Same is true with Docker. I’m always blown away by going to these DockerCon conferences and seeing one, a number of people that are new to Docker which is always a high percentage. And then two, the relatively few amount of folks that are using it in production. There is definitely a bit of a gap there between, kind of saying you’re using it, playing around with it, even contributing to it versus actually having a very viable need for it, and using it as part of what you need to run your production environment.
Chris: Again, we’ve talked about how service meshes in general, as you have much larger systems, they become almost a necessity. That doesn’t necessarily mean you can’t get value out of them with smaller systems. Some folks may very well get value out of the smaller systems. It’s just that they’ve decided to make a tray of putting in the time and effort to deal with the learning curve. Maybe they do have a separate ops teams that help out and take this thing full time. Different subsets.
Jon: Cool. Alright.
Chris: Maybe just again, I’ll point out, Istio is one of the implementations of the service mesh. There are others out there, so things like AWS App Mesh, you have things like Consul from HashiCorp, Linkerd. There are other implementations out there but Istio by far is the most […], the one that you’ll hear about. I think the big part about that too is it’s based around the envoy proxy. The envoy proxy was, I believe, the first component that was there. It wasn’t necessarily Istio specific, I don’t think. People were using that, I think that’s kind of a bit of a gravity will that led to the dominance, if you will of Istio. With that, maybe we can talk a little bit about the components of Istio and just how it’s architected.
In this talk by Zack, this is all about Istio. Kind of the practical knowledge that you need, how it’s architected and some use case scenario. As part of that, he talked about the five main components of Istio. The first one is envoy. We’ve talked about this a bit. This is the proxy that sits beside your app. This is that sidecar deployment. As we talked about last week with service meshes, your software is not talking directly to the rest of the other software. Instead, it’s talking to proxy, and then the proxies are talking to other corresponding proxies which then forward it onto the actual software. It’s all proxy to proxy communication and that’s where the mesh is basically implemented. Envoy is that proxy. it’s a key component of the Istio.
The next component is the galley. This is, I think, you made reference last week, “Weird nautical name.” We’re going to get into it here. Galley is the admin, the UI tool to configure the control point. This is basically the management UI. It’s galley pilot used to configure the sidecar proxies. This is what’s responsible for propagating the configuration that’s defined in your system via galley then get it out to all the proxies that are in here in your system. That’s pilot.
The fourth component is mixer. Mixers’ responsible for enforcing policy as well as capturing telemetry. When your mini envoy proxy receives a request, it’s going to consult with mixer to do the policy checks, to verify whether or not that request is allowed to then be sent onto the actual on the line piece of software that it’s proxying for. That’s mixer.
The fifth is called the citadel. Citadel is basically responsible for security. It’s doing things like assigning X.509 identities. It’s responsible for enabling secure communication and specifically it is built on top of the SPIFFE standard. Those are the five major components of an Istio deployment. Again, not lightweight at all. It’s only a complicated system.
Jon: Yeah. It’s interesting. One sort of comment about all these components, it does seem like some of them are all about authentication, authorization, and security. Yet, it’s pretty well established that if you are running Kubernetes, you’re best off making sure that the workloads that Kubernetes is running, it wouldn’t be a horrible thing if there got to be some cross data or some data. Basically, keep your customers separated. Don’t put two different customers that are not allowed to see or two different workloads that are not allowed to know about each other off the same Kubernetes. It’s just kind of interesting to me from that perspective. It’s like a lot of work around security yet the underlying thing that it’s running on can’t really be trusted.
Chris: Yeah. Things like that can’t handle things like placement policies when running containers and whatnot.
Jon: Yeah. I guess the consensus right now is that if you’re running containers and you’re running them all, they can talk to each other, they were all on the same network, they should be, I don’t know the word for it, the workload that are sort of complimentary, but that’s what they should be. They should not be workloads that are, I wouldn’t want to run Apple’s workloads on the same Kubernetes cluster as Microsoft’s at this point.
Chris: Yeah. This definitely kind of touches on security, containers, and all of that works. Definitely, segregating workloads of your containers to kind of break them up into categories. You might have some very sensitive applications like security as paramount and they don’t need access to the open internet. You may run those on a certain subset of your cluster. You might not run them alongside like a frontend application that’s talking to the world with imports.
Jon: Yup. I guess, just to kind of put a fine point on what I’m saying is like, a lot of work around routing and security, but that doesn’t mean it’s secure. It doesn’t mean it’s like, “Oh, yeah. This is like a platform as a service. Just go a run whenever you’re on it.”
Chris: Yeah. That’s a good point. Security is a multilayer cake, for sure. This is more for the infrastructure. The communication level of making sure that you have encrypted communication channels and you can cryptographically verify the identities of the players but you have many layers of that. Just dealing with, who’s a lot to talk to who, do you have multi-tenant issues placement of workloads. We talked about […] just like CDEs, OS hardening, and everything else. That’s all definitely a part of it. This is just a piece, for sure.
Jon: Sorry. Remind me what CDEs stands for?
Chris: It refers to vulnerability list. I forgot exactly what the acronym stands for. CDEs are when publicly disclosed security violations or vulnerabilities are found in software products. There’s CBE database out there. “Oh, there’s something found in the Fedora distribution of Linux in this particular package.” Or, “There’s an issue with TLS 1.0.” There’s a CDE for that type of thing.
Chris: Cool. After going over the major components and architecture of Istio, Zack then went it a little bit about how this whole works. We’ve kind of touched on this, but just to make it a bit clear. The envoy proxy that’s responsible for intercepting these requests that are destined for the software that they’re proxying, and it’s doing that either via IPtables or BPF configuration, playing around with name space. Once it intercepts requests, it’s going to look at that request, and determine what the new destination should be. That’s going to come from a writing table that was pushed by pilot. That’s like the component that’s responsible for propagating the configuration to the side cars.
The receiving proxy is going to consult the policy as been defined in the system via making the call to mixer. If that policy check passes then it’s going to send it to the destination application. As part of this, both those proxies will then report telemetry to mixer. Now, we can have insight into both the caller and the callee. Get information from them. The default behavior now is actually only to freeport the service side telemetry but you can configure it, so you can see both side.
Jon: I think that I and our listeners will benefit from just telling us what telemetry means because you know what, that is the word that gets run around conferences by smart people and nobody ever asks what it means.
Chris: It’s just data about what’s going on in your system. Telemetry is just any kind of data point. In this particular case, telemetry and those examples just means the fact that like, “Oh, this particular call was made between these two components.” There’s a source, there’s a destination. There’s going to be an information there about how long it took. There’s information about whether it’s successful or not. All that information, just telemetry. It’s a stream of events. I think it comes from, even more common like a satellite communication. You’re getting telemetry from satellites. It’s just a feed of data. The data packets are coming through and you’re getting information about what’s going on.
Jon: That’s probably what’s intimidating about that word. That’s complicated, telling me you’re positioning the sky through some sort of long feed. That’s hard stuff. It’s just like, whatever data you got, let me just send it to you. That’s telemetry. I just figured, it’s kind of funny that it’s such a hardcore concept in terms of understanding satellite positioning has been turned into, in a computer science world, it’s like logging.
Chris: Yeah. Telemetry could be partially logging. It could be metrics. It could be everything. It’s probably one of the reasons why we use it because it’s a catchall.
Jon: I figured it also kind of gets back to something that we just have been touching on these episodes a little bit and just keep coming back to which is this is hardcore fancy computer programming. It uses hardcore fancy words like telemetry and control playing. Words that until somebody says, “Oh, well. It’s just a simple thing.” Will make conversations about it opaque and that is a magnet for certain people. “Let me get into that. Let me understand that opaque, so that I can speak that same language.”
Chris: Yeah. There’s a whole lingo to it. It’s constantly evolving. Things like single pane of glass. We talked about last time how I haven’t heard that term before of brownfield. It’s constantly evolving. Someone comes up with that name, they do talk, or write a blog post, or write a paper, it kind of catches on, the Twitter world takes in and runs with it or whatnot. It’s just something you have to have in your dictionary. It’s not the see, […], run; […] type of lingo.
Jon: This is not displayed on Zack at all because I think he’s doing a service by not only understanding this and giving this talk. But helping people get over the hub, like explaining how Istio works is sort of dredge work for people that already know how Istio works. We appreciate that.
Chris: Yeah. By the way, I think this talk was a black belt level talk as well. Definitely, not an introduction. That’s basically, at the end of the day, how Istio works. It’s the envoy proxies are doing the bulk of the work. They’re consulting the configuration that’s coming from pilot to figure out how they should do their routing and whatnot. The mixer’s responsible for making sure that the rules for what’s allowed to talk to what is followed. Kind of more on the authorization side. Citadel is for […], the authentication in the system.
Jon: Very cool.
Chris: Again, as we talked about last time, spinning up Kubernetes cluster and putting Istio on that. We did that in the previous talk. It was something like 59 containers were spun up just to get Istio up and running. Pretty complicated system.
Rich: Hey, there. This is Rich. Please pardon this quick interruption. We recently passed an internal milestone of 30,000 listeners. I wanted to take a moment to thank you for the support. I was also hoping to encourage you to head on over to iTunes to leave us a review. Positive feedback and constructive criticisms are both incredibly important to us. Give us an idea of how we’re doing and we’ll promise to keep publishing new episodes every week. Okay, let’s dive back in.
Chris: After kind of talking about the components and how it worked then kind of went through a couple of demos. Pretty fast paced, just typing at the laptop and showing what you can do with Istio. These demos demonstrated two main things.
The first demo was, “Hey I have two clusters. Now, I want to do load, balance in across these clusters and selectively, I can route all traffic to just one cluster instead of to both and vice versa.” You can all do that via Istio and core DNS extensions that Istio supports. All of these were done without things like VPNs or paired networks. It all actually used public IP addresses. Each one of these clusters had public IP addresses as the egress point.
Jon: I was thinking, you might mean ingress?
Chris: Yeah, it’s actually ingress. There is a corresponding egress proxy as well. Ingress means inbound traffic. Egress basically outbound. For both these clusters, for ingress traffic, they had a public IP address that’s how you talk to a particular cluster. After that, the service mesh and the envoy manage everything else. You could have cluster to cluster traffic, cross-cluster traffic, over the internet via these public IP addresses which is kind of nice in that now that you don’t have to worry about things like slide range conflicts unless you’re building that in your backend systems.
That was the first demo. Again, all you have to really do is go in and change some configuration to around the naming in the system.
Jon: I realized, slider range conflicts, what that means is if you have two different clusters and say there’s a computer in each of them, they both have their IP address in that 0-5, that’s fine. It wouldn’t normally be fine if you were actually connecting those two networks together then you would have an IP address conflict. But if you put Istio between them, sounds like no problem.
Chris: Yup, indeed. The second demo was kind of showing how you can use something like Istio to help you do a common task of decomposing a monolith with a facade type of architecture. Basically, the idea is, you’re going to decompose your monolith by slicing on functionality into new microservices. You can do that by having this routing facade that can now host or path based routing, so that when request come in, the callers don’t need to know, “Oh, is it the monolith or is it the microservice A or B or C?” All that is part of this routing intelligence. It can handle that.
That’s what’s this demo was based about. This is really similar to the host and path based routing that AWS application load balancers give us. Super similar if not identical in functionality. But that’s just one of the things that something like Istio can give you that functionality. Pretty powerful.
Jon: He’s also been speaking about, through a lot of this talk, is that a lot of the features that you get are sort of features that we rely on AWS for for a lot of the work we do like ALB, path base balancing. It seems like Istio, at some level, it’s just sort of like a super smart load balancer. AWS is also building pretty darn smart load balancers. If you don’t need one, that’s like that can sort of know your whole system and know how to route things based on really intricate rules, and you’re allowed to use AWS then, it’ll take you a long way. But if you can’t use AWS full, if you’re onprem or if you’re across clouds or just basically have a CIO that won’t allow you to do things that would cause you to get locked in then Istio starting to look really nice.
Chris: Keep in mind, these service meshes, they’re pretty comprehensive and the features are kind of breakout into those three categories—traffic management, security, and visibility. Traffic management, if that’s really what you’re using it for, you can just subout alternatives for that. Whether it be load balancers from AWS then fine, that might be what you need.
Chris: Almost all of these stuff has analogies and analogous services that you can go and use. You can kind of build your own equivalent service mesh just by ala carte by going and picking other services as well.
Jon: But any other ones that you choose, I feel they’re more locked into this. This is totally open source, covers every single thing you can possibly think of and works anywhere—anywhere where you can have computers talking to each other. I get that that is pretty enticing.
Chris: Yeah. That’s a whole nother world out there. There are people that definitely have that as part of their requirements. They need to run on open source software and they need to be able to run in various public cloud providers as well as private like onprem. If something like Kubernetes and Istio ends up being almost […] that’s what they have to do.
Jon: Yeah. This conversation also caused me to think of another conversation we had about AWS taking on big open source projects and run the best managed services. The very nature of what Istio is, is sort of, I can’t remember the word that you used at the beginning but it’s essentially like a controller of your entire deployment, a thing to manage your entire deployment with. It’s sort of like antithetical for AWS to do that because that’s what they are, why would they do that? Sort of like, “Oh, don’t use any of our tools. Use this.” That does not make any sense at all. AWS also has their own service mesh. Istio does seem to be an open source project. It maybe a little bit well-positioned if you do not get the AWS treatment.
Chris: Yeah. AWS has App Mesh and that’s basically what they’re giving away is the envoy proxy. It is for their container systems ECS, EKS, and whatnot. There’s no pilot. There’s no citadel type thing. It’s really kind of using it for that communication control plane and doing the service to service. Communication is really focusing on things like service discovery. When you have lots of containers, service discovery is actually a very big pain point in AWS. They have several different initiatives there in that space. This is one of those things that they’re offering as a service for folks.
Jon: Cool. You want to run through the conclusion around the traits of successful mesh adaptor?
Chris: Yes. After giving these two demos, that kind of is the conclusion which is pointing out what are the traits of people that are successful adopting service meshes. Given the folks that Zach’s worked with, this is the commonalities that they’ve seen. The first point that he made was that they’re focus on a single pain point.
Jon: Not to try to squeeze every feature out of SEO on the get-go.
Chris: Exactly. It’s a very big huge complicated system. Pick one specific thing that hurts the most and use the mesh to only solve that problem. If it is service discovery, that is your pain point then just really focus on that. If it’s, “Hey, I really want to have mutual TLS communication between all my containers.” Focus on that. Don’t go and try to swallow the whole from day one. That’s the first tip.
The second one was have a small group that are the champions in the service mesh in the organization. Have some small team. It’s their responsibility to become the experts in service mesh. They’re the advocates. They should also be the ones that’s experiencing that pain. Have a group of champions in your organization.
The third tip was just make sure you invest time in the learning curve. There is a lot here. It could be pretty easy just to spin stuff up and not really understand what’s going on. You definitely need to invest the time in and just learn it. Just make that commitment to it.
Fourth tip was only expose a small subset of configuration to the developers that are using it. We’ve touched on this before where service meshes are kind of anti-DevOps. They’re so complicated. There’s so much configuration. There’s such a big learning curve within that you need dedicated teams to use and to manage these things. It’s kind of splitting back up, dev and ops. You really have a dedicated ops team that’s responsible for it. The developers are just not able to spent the cycles and make the investment to learn about that. Zach’s point here was don’t overwhelm your developers with the service mesh. Instead, air on the side of giving them less information to swallow and give it to them in spoonfuls at a time, so that there’s less pushback from the developers.
An example he gave is, if you’re just trying to solve traffic management in the system and that’s what you’re using the mesh for, then only give the developers, tell them the one file where they need to go change the virtual service configuration to affect that traffic management for an application they’re building. They’re view of the service mesh is like, “Oh, there’s this config file, this gamma file, I go in and this is where I put my service entry, ” so it’s not intimidating and a big learning curve for them.
The last trait here is the concept of if you’re in it for the long haul. Once you make this, that’s new to the service mesh and you start to incorporate it into your overall system architecture. It’s a pretty big component of it. One, it’s not easy to integrate it to your system and then conversely, it’s not going to be easy to pull it out. You’ll realize that this is a long haul investment that you’re going to have in your system.
Jon: Yeah. It’s the Istio lock-in.
Chris: Yeah. Definitely service mesh lock-in. Yeah, most likely Istio lock-in as well. Just like any of these systems like Kubernetes, if you’re using Kubernetes to run your container, it’s really going to be real hard to switch to something else.
Jon: Yeah. I was like, is it really? Imagine you have Kubernetes sort of your overbuild and you just have a few containers. Then, you’re running in Kubernetes, and those are all in Docker. If they’re not that complex but you’re just running them in Kubernetes, it may not be that hard to switch them over say, Istio, but yeah. If you’ve got a big system, for sure.
Chris: Yeah. We can probably talk about this quite a bit. It just depends on what folks are used to. I’m sure […] has been pretty simple and straightforward with doing it but someone from the Kubernetes world is ike, “Wait a minute, what’s a launch configuration on a scale group? How did those get tied into it? What’s a task definition file?” It’s a whole other just lingo and way of doing things. Even if you have a simple system, it still might be a lot of work.
Jon: Yeah. Thank you so much. At this point, the cool thing is, all of our listeners have the same thing hopefully. It’s like, “While I may not be poised Istio in the future or a service mesh in the near future, as they progress and new things happen to them, I won’t be left out in the cold. I’ll be able to speak the language a little bit.” That’s very cool. Thank you.
Jon: Thank you to Zach Butcher as well for this talk at DockerCon. Alright. Chris, Rich, we’ll catch you next week.
Chris: Alright, see you.
Rich: Well, dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with show notes and other valuable resources is available at mobycast.fm/62. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.