October 2, 2019

80. Are You Well Architected? The Well-Architected Framework – Part 3

Show Notes
Transcription
Discussion

Sponsor

Summary

Previously on Mobycast…

In episodes #78 and #79, we broke down the AWS Well-Architected Framework, and covered the first three pillars of excellence: “Operational Excellence”, “Security” and “Reliability”.

This week on Mobycast, Jon and Chris wrap up their three-part series and discuss the last two pillars of excellence, “Performance Efficiency” and “Cost Optimization”. We then bring it all together by explaining how to perform a Well-Architected Review. Spoiler alert: the Well-Architected Tool is a fabulous resource that you need to have in your toolkit.

Show Details

In this episode, we cover the following topics:

Pillars in depth

- Performance Efficiency

- - “Ability to use resources efficiently to meet system requirements and maintain that efficiency as demand changes and technology evolves”
  - Design principles

- - - Easy to try new advanced technologies (by letting AWS manage them, instead of standing them up yourself)
    - Go global in minutes
    - Use serverless architectures
    - Experiment more often
    - Mechanical sympathy (use the technology approach that aligns best to what you are trying to achieve)
  - Key service: CloudWatch
  - Focus areas
    - Selection

- - - - Services: EC2, EBS, RDS, DynamoDB, Auto Scaling, S3, VPC, Route53, DirectConnect
    - Review
      - Services: AWS Blog, AWS What’s New
    - Monitoring
      - Services: CloudWatch, Lambda, Kinesis, SQS
    - Tradeoffs
      - Services: CloudFront, ElastiCache, Snowball, RDS (read replicas)
  - Best practices
    - Selection
      - Choose appropriate resource types

- - - - Compute, storage, database, networking
    - Trade Offs
      - Proximity and caching
- Cost Optimization
  - “Ability to run systems to deliver business value at the lowest price point”
  - Design principles
    - Adopt a consumption model (only pay for what you use)
    - Measure overall efficiency
    - Stop spending money on data center operations
    - Analyze and attribute expenditures
    - Use managed services to reduce TCO
  - Key service: AWS Cost Explorer (with cost allocation tags)
  - Focus areas
    - Expenditure awareness
      - Services: Cost Explorer, AWS Budgets, CloudWatch, SNS
    - Cost-effective resources
      - Services: Reserved Instances, Spot Instances, Cost Explorer
    - Matching supply and demand
      - Services: Auto Scaling
    - Optimizing over time
      - Services: AWS Blog, AWS What’s New, Trusted Advisor
  - Key points
    - Use Trusted Advisor to find ways to save $$$
The Well-Architected Review
- Centered around the question “Are you well architected?”
- The Well-Architected review provides a consistent approach to review workload against current AWS best practices and gives advice on how to architect for the cloud
- Benefits of the review
  - Build and deploy faster
  - Lower or mitigate risks
  - Make informed decisions
  - Learn AWS best practices
The AWS Well-Architected Tool
- Cloud-based service available from the AWS console
- Provides consistent process for you to review and measure your architecture using the AWS Well-Architected Framework
- Helps you:
  - Learn
  - Measure
  - Improve
- Improvement plan
  - Based on identified high and medium risk topics
  - Canned list of suggested action items to address each risk topic
- Milestones
  - Makes a read-only snapshot of completed questions and answers
- Best practices
  - Save milestone after initially completing workload review
  - Then, whenever you make large changes to your workload architecture, perform a subsequent review and save as a new milestone

Whitepapers

End Song:

The Shadow Gallery by Roy England

We’d love to hear from you! You can reach us at:

Web: https://mobycast.fm
Voicemail: 844-818-0993
Email: ask@mobycast.fm
Twitter: https://twitter.com/hashtag/mobycast
Reddit: https://reddit.com/r/mobycast

Voiceover: In episode 78 of Mobycast, we introduced the AWS Well-Architected Framework, an indispensable resource of best practices when running workloads in the cloud. We explained that the framework defines five pillars of excellence, and we dug deep on the first pillar, operational excellence. If you missed that episode, you can hit pause and we’ll wait here while you catch up.
Now in this episode of Mobycast, Jon and Chris continue their three part series on the AWS Well-Architected Framework, and discuss the next two pillars of excellence, security and reliability.
Welcome to Mobycast, a show about the techniques and technologies used by the best cloud native software teams. Each week, your hosts Jon Christensen and Chris Hickman, pick a software concept and dive deep to figure it out.

Jon: Before we get started today, I’m so happy and so proud to be able to announce that we have a sponsor. So Mobycast is no longer ad-free. But our sponsor is one that we really do care about. We use CircleCI, and we’ve talked about CircleCI in a previous episode, I’ll link to it in the show notes.
So I just have this to say. This episode is brought to you by CircleCI, the continuous integration and delivery service used by companies like Twilio, Intuit, And Tinder. CICD is so important for keeping teams building, it’s all CircleCI does. They focus on creating powerful, flexible CICD pipelines so that you and your team can focus on doing what you do best. Whether you’re a company of 5 or 500, CircleCI can build, test and deploy your Linux, Windows and Mac OS projects from GitHub and Bitbucket in their cloud or installed on your servers. And anyone can sign up and start building for free since CircleCI gives both private and public projects 1000 free build minutes per month. Sign up and start building for free circleci.com.
Welcome Chris, it’s another episode of Mobycast.

Chris: Hey Jon, good to be back.

Jon: Yeah, good to have you back.
So where are we? We’re deep, deep, deep in the land of the Well-Architected Framework from AWS.
It’s episode three of this Well-Architected Framework, right?

Chris: It is.

Jon: It is.

Chris: We’re deep, we’re now climbing back out. So the… [crosstalk 00:02:32].

Jon: Yes, yes, that’s true. If it’s a Canyon, we’re, yeah… If it’s a Canyon we’re past the middle point. So this should be the last episode about the Well-Architected Framework and we really hope that this has been helpful for everyone listening because it’s kind of hard stuff and it’s, and there’s a lot of detail to it. But it kind of covers the spectrum of everything that all the decisions that you need to make and everything you need to think of to write 100% reliable software.
And as I say reliable, I’m also thinking about the other pillars and I’m not going to mention them all right now, but just 100% great software and maybe even better software than many companies have the budgets for as we’ve said. So this week we’re going to talk about the final two pillars and we’re going to talk about the well architected review and the well architected tool. So the final two pillars are performance efficiency and cost optimization. That cost optimization one I like, because it goes back to what we talked about a couple of weeks ago about how the Well-Architected Framework is more business focused or at least keeps the business in mind while making technology suggestions. So yeah, Chris maybe take us in.

Chris: Yeah. So let’s dive right in. So pillar four, performance efficiency. So this one is the… The definition is it’s the ability to use resources efficiently to meet system requirements and to maintain that efficiency as your demand changes and technology evolves. Right, so you know, again, pretty easy to wrap our heads around, pretty straight forward.

Jon: Yeah. That one’s less of a mouthful than some of the other ones. You’re just using efficiently. So that must be computers, networks, discs, databases. Using them efficiently to meet system requirements, I guess, that’s your business requirements or the overall requirements of your system to meet your business needs. And then maintain that efficiency as demand changes, so maybe more or fewer users or more things that you’re processing if you’re not a user based system. And as technology evolves, that’s a big challenge, right? So we’re not going to stay on the mainframe with COBOL.

Chris: And it is a challenge, right? Because the, we’ve talked about this so many times on episodes of Mobycast, how the rate of change is increasing, right? There is so much going on in and it really is a struggle just to keep up. So same thing goes in this space it’s like there’s always like… Seems like on a regular basis, new instance types, right? That are more efficient, better performance out of them. Same thing goes with just everything across the board. Networking, storage, CPU, all of that. And also different technologies to use so it may be the instead of using MongoDB, you should be looking at DocumentDB, right? Or instead of using a Postgres, you should be looking at DynamoDB perhaps. So that constant learning and evolution becomes really, really important in this pillar as well.

Jon: Right on. So as usual, each of these pillars has a few things that we talk about. Design principles, key service focus areas and best practices. Are you going to take us into design?

Chris: Yes. So let’s talk about the design principles. And so again these are some of the core design principles as outlined by the build Well-Architected Framework. So for the performance efficiency pillar design principles, include, one it’s easy to try new advanced technologies. And really what this is about is saying like as these new technologies come out, the likelihood that AWS will offer them as a managed service is greater. It’s pretty good possibility, likelihood there. And it just makes it so much easier for you to then consume it, right?
So you don’t have to go figure out how to stand up some complicated piece of new software and do stuff… Just again, the undifferentiated heavy lifting instead of let AWS do that for you. And then you can just consume it as a managed service so making it just so easy to try these new advanced technologies. And you know, again that feeds into that continual improvement and evolving and then not staying static. Keeping up, keeping ahead of the curve to be as efficient as you can.

Jon: Right. I guess the examples of that would be a AWS pushing into AI tools. Even simple ones like AWS Transcribe, but also more generalized ones like an AWS Sagemaker. But also AWS is like, “We’re not sure what’s going to happen with this blockchain thing, but let’s throw out a couple of tools. Like let’s be the picks and shovels of this blockchain movement. So here’s a couple of tools you can use.” And that was very evident at re:Invent as well. They were like, “We really looked at this and we’re not sure, but these are the things that seem to be resonating with people.”
And then what else? I just feel like there’s a few other… Oh, just new database technologies in general. They keep adding those to the managed service fleets. So yeah, those are some of the new new things that they make it easy for you to use without having to go stand up a machine that need you to, and install it.

Chris: Yeah, I think DocumentDB is a great example too, right? Like, I mean, there’s Mongo, Mongo’s out there and it’s a great product, but you know, unless you go with their managed service, Atlas, if you want to run it inside AWS, then you’re standing it up yourself and it’s a lot of work. And so to have someone else do that for you, to let AWS do that for you, it makes it just so much easier just to get going. You don’t have to worry about, “Okay, how am I going to monitor these things and what’s my backup restore?” Like I got to put all that together in place versus those are just check boxes in AWS.

Jon: Right. Exactly.

Chris: All right, so I’m moving on. Another design principle is just kind of realize that you can go global in minutes, right? And so really what this means is that it means AWS is so robust and full featured they have multi regions, right? There’s regions all over the world. And so it literally is like you can have your resources all over the world within minutes and that opens up new possibilities. And again, from a performance standpoint, right? So if you have users spread all over the globe, you can now provide them a much better experience by being closer to them physically by leveraging the AWS platform.

Jon: You know that is so cool that that’s true. And I kind of have this sort of weird regret that I’ve never had a chance to really take full advantage of that. I mean all the way back to the beginning of AWS, I thought, “Oh man, you can start these things up anywhere so fast. You can have hundreds or thousands of computers going all over the world like in minutes, what would I do where I can do that?” Like, what can I build where I need that? And nothing has really come along crossed my path where I actually need to do that. I’m not really making a denial of service system. I’m not building the botnet and I’m not building any software that has hundreds of millions of users all over the world either. So it’s kind of a bummer as cool as that is, I just haven’t really had an opportunity to take advantage of it. How about you Chris?

Chris: No, I mean again, it goes with what’s your user base look like? And if you have a global user base, then chances are then you’re going to need to be multi-region. And so yeah, not a lot of experience there with that, but I think, you know, we have [zoo pits 00:00:11:03] right? So I mean I think there’s a plan there to kind of branch out internationally, right? And to go into Europe, [crosstalk 00:00:11:11].

Jon: Yeah, there it.

Chris: So that’d be really…

Jon: In fact, in fact it might be happening like really, really soon. Like… So I think, and that’s actually that that’ll be maybe something to talk about in a future episode because we’re going to have to figure out like, all right, first of all, we’re going to just unfortunately put some people that are in other places on the longer latency system, they’re just going to have to wait. But eventually we’d like to move to be closer to them so they can have a better experience and how to migrate from single region to multi multi-region will be a fun challenge.

Chris: Yes. And that’s kind of like the point of this whole like design principle and what AWS gives you is that like the tools and the are there and to do something very sophisticated ends up being not like a tremendous amount of effort that could be if you were trying to do this all on your own, right. There’s a tool there and just even like multi-region and DNS, right? The route 53 and the various routing models that it has, it’s just you can do geo proximity routing, right? Or you can do least latency, so, pretty powerful.

Jon: Well, I have a friend in there who worked at Google, still works at Google. And that was his job is to go, he specifically would go around in Southeast Asian countries and work with real estate people, leasing experts, the governments of those countries in order to get data centers built out in Southeast Asia. A hundred plus million dollars a pop. So it’s like, yeah, you don’t have to do that stuff. You can just use the work that folks like Andy already did.

Chris: Right. Indeed. All right. Yeah. So moving on, another design principle is used server less architectures and so…
AWS wants you to do that. Do it.
Yeah. So again, I mean we’ve talked about this too, like the benefits of server less, like a big value add is that you’re not managing infrastructure. You’re letting someone else do it. It’s also, it’s much more, you’re just paying for what you use so you don’t have to worry about over provision capacity and whatnot. But again, I mean it’s a big open topic in architecture and you just have to figure out what works best for you. What’s the right mix.
But definitely it should always be kind of like top of mind. Where does it make sense to use server less architecture. And so another design principle is experiment more often. So I mean this one feels like… We’ve heard this one before, right? Like this feels like a pretty…

Jon: Does feel familiar.

Chris: And so again all of these pillars involve like continuous improvement, right? The feedback loops, evolving, knowing that technology is always changing and you have to stay current. So the ability…

Jon: Is another design principle going to be to automate something?

Chris: No. Yeah, no. We do have one more but yeah, so experiment more often is definitely pretty… AWS gives you the capability… Like the cloud gives you the capability to make this so much easier. So definitely leverage it. And then the last one, design principle is, this is a weird one, mechanical sympathy. And I almost didn’t even want to include this one because I’m…

Jon: Like take it out? Kelsus has decided to remove this from the AWS Well-Architected Framework!

Chris: Just because… Well, I’m confused by it, right? Like mechanical sympathy. Like what do the… What do they mean by this? And…

Jon: I’m thinking about mechanical Turk for whatever reason. Is it related to that?

Chris: No, I mean, I don’t think so. I mean, reading into it a little bit more it feels like it’s centered around like kind of like using the technology approach that aligns best to what you’re trying to achieve. But again… So it feels like use the right tool for the right job. Right. So instead of square peg, round hole. But mechanical sympathy, I just, I mean it’s just… That does not align with that. It may be kind of more of an Amazon term or I don’t know. Or maybe other folks have you used this term as well? But for me it was a little bit of a head scratcher.

Jon: So mechanical sympathy, I guess you could imagine machines working in harmony. They’re well lubricated and the levers and arms and gears like fit perfectly together, and they do the job they’re doing… It resonates with the shapes of the machines themselves like that but for code… Yeah?

Chris: Yeah. No, no, no. I hear sympathy and I’m just like, “Aw, I feel so bad for you machines. You’re working so hard.” You know? So…

Jon: Well, and that’s why I was thinking Mechanical Turk. I was like, “That’s kind of yikes.” Like mechanical Turk is a bunch of people clicking on images to identify them like millions and millions and millions of images a day get identified by many Mechanical Turk workers. So maybe it’s like feeling bad for them.

Chris: So this principle is be glad you don’t have to do their job and be thankful that they’re doing it. You know, not you.

Jon: Yes.

Chris: Have that sympathy. Yeah. All right. So…

Jon: What are weird one, weird one. Okay. Continuing on.

Chris: So those are the design principles. So key service here in this pillar for performance efficiency is CloudWatch. I mean…

Jon: Again?

Chris: We hear it again and again, right? Like CloudWatch is so key. It’s so critical. It’s so instrumental. I’m giving you the insight into everything that’s going on in your system. So definitely something to take advantage of.

Jon: There’s one design principle that I want to go back to, it was the experiment frequently one. And I just had this realization that the pillar we’re talking about here is performance efficiency. So is the assumption that by doing experiments you can learn how to be more efficient. Because I guess there’s a balance, right? If you’re spending all your time experimenting, then maybe you’re not getting your job done very well. You’re not being very efficient.

Chris: Yeah. I mean it’s that trade off, right, of what are you spending on evolving versus like operating and maintaining and whatnot. So yeah, you have to have the right mix. So you can’t stay static and say, I’m just going to focus on just continuing add on to features and kind of just keep status quo, right? Because that’s how tech debt builds up. And if you get too much tech debt, then you’re going to be stuck, right? You’re going to be in a really, really bad place. You’ve got to find that right mix of you pop your head up and like take a survey of the landscape and you’re measuring your system, right? You’re understanding like where are some of the bottlenecks, right? Where are some of the things that need improvement?

Jon: But that’s how it went in that experiment is though, Chris. It’s like an experiment means I have this hypothesis that if we did this change this or that would be the outcome. And so rather than… And it’s saying do those. And I guess, okay. Okay. But I would caution people to get too wrapped around all the kinds of experimentation because the experimentation is expensive in its own right. And unless…

Jon: … experimentation, because experimentation is expensive in its own right. And unless you typically get good results out of your experiments, like pretty solid performance efficiency gains, then you’re spending a lot of money doing experiments without seeing a lot of return on the purpose of those experiments, which is performance efficiency gains.

Chris: Yeah-

Jon: That’s the trade-off I’m talking about.

Chris: Well, it depends on the scope of your experiment, right? So it’s like your experiment can be like, “Oh, I’m going to go from RDS Postgres to Dynamo DB.” Right? And that’s a-

Jon: Mm-hmm (affirmative).

Chris: … that’s a big experiment, right? That’s going to be a lot of work, but you could also say an experiment is I’m going to go from Amazon Linux one to Amazon Linux two and see what the gains are there or I’m going to switch from M4 instances to M5. Right?

Jon: Mm-hmm (affirmative).

Chris: So just again, use the right level of pragmatism for what works for you. And again, kind of looking to see where you’re going to get the best bang for the buck. What are the areas of your system that could use improvement and then how you might do it so it’s the right mix.

Jon: Cool. All right. And we should probably burn through the focus areas pretty quickly.

Chris: Sure. So four focus areas here for this pillar. Selection, review, monitoring, and trade-offs. So selection, this is all geared around making sure that you’re choosing the appropriate resources types for the functionality of your workload, right? So this comes down to things like compute, storage, database networking and just understanding all the various levels and types that are offered there and making sure that you’re picking the appropriate ones.
So again, if you see two instances, choosing the right instance family, right? If you have a workload that’s CPU intensive, picking the instance types that are geared towards that versus a memory intensive application or whatnot. Same thing goes with storage if you’re using EBS volumes for block level storage, what’s your IO rates?
Like, do you need IOPS provision devices? Versus maybe you don’t that kind of throughput. So you don’t need to go to a more expensive volume type.

Jon: Right, right.

Chris: So just really understanding all the various options out there, what they’re built for, and then knowing your workload so that you pick the right one. That’s selection.
Some key AWS services that are aligned with that particular focus area. EC2, EBS, RDS, Dynamo DB, S3, Route 53, Direct Connect. I mean, these are all … it’s basically-

Jon: Anything that can be configured?

Chris: Yeah. Just about everything, right? Yeah. Anything that’s not like an actual application service type thing. So yeah. So moving on the next focus area, review. So this is part of that continual learning and staying up to date and kind of just knowing what’s out there, right? So review just means you continue to review your workload architecture. Keep a apprised of what’s new, what are the latest developments, and then incorporating that into your plans.
And so here they talk about good ways to keep up are the AWS blog and then also the AWS, what’s new page and email newsletter as well that goes out. And of course going to things like Re-invent and the various AWS summits that are happening all the time.

Jon: Excellent.

Chris: Next focus area is monitoring. So obviously pretty straight forward here as well. And so key services for monitoring, CloudWatch.

Jon: Right.

Chris: And then you Lambda, Kinesis, and SQS. So these are all, basically once events are generated, where are they routed to and then how do you act upon those events, right? So we can send them to Lambda and have some code do something with those events or it can be sent to a stream in Kinesis or we can send them to a queue to be processed later. Like with SQS.

Jon: Given that monitoring appeared in every single pillar so far, it’s like maybe it should have been its own pillar?

Chris: Mm-hmm (affirmative). I mean, if you asked like, “Look at the five pillars, where’s monitoring?” It’s operational excellence, right? I mean, it just really jumps out. But again, this is so core-

Jon: But security too. Are you monitoring your systems for security? Like all of them. Reliability, are you making sure they’re not falling over? Performance efficiency, are you making sure that they’re not dragging? And then finally cost optimization. Are you making sure that you’re not paying out the wazoo? Like, all of them need monitoring.

Chris: Mm-hmm (affirmative). Yeah, it almost feels like … I mean, so we have the general design principles that we kind of kicked off. So the well-architected frameworks, the thee things. The design principles, the pillars, and then the actual review process itself. Right?
So with the, you know, the general design principles we talked about, “Hey, you don’t want to guess your capacity needs. You’re testing at scale. You automate all the things, you’re allowing for your architectures to evolve and to improve over time. You’re driving architecture through data and you’re improving through came days. Right?” Those are the general design principles.
It feels like we could have talked about monitoring here as an instrumentation. As being like one of those key things and maybe just-

Jon: And then saved ourselves having to run it five times down in the pillars.

Chris: Yeah.

Jon: Yeah.

Chris: Yep, indeed.

Jon: Cool.

Chris: And the fourth focus area is trade-offs, right? And so this is again kind of a weird name for it.

Jon: It is, yeah. My focus area is trade-offs.

Chris: Yeah. But really what it is, it’s trade-offs in the respect of performance efficiency. And really what they mean here is, what trade-offs can you make to increase your performance? And it really, at the end of the day, they’re kind of talk about caching. So proximity and caching.
So things like Cloud Front, taking advantage of Cloud Front and Edge locations to put the content in closer proximity to the actual users. So that’s going to cut down latency and they’re going to get whatever it is that they’re requesting. They’re going to get it much quicker using things like ElastiCache for using caching so that you aren’t hitting your database constantly or your relational database constantly for information or whatever it is that you want to cache, right?

Jon: The both of those were read efficiency versus write efficiency trade-offs.

Chris: Mm-hmm (affirmative). And one of the things they list is Snowball, which, okay. But really the trade-off there is like, okay, you could have a dedicated network … You could use like direct connect or the open internet or whatever it is to go transfer a terabyte of data into S3 or you could go order a Snowball and then just load it locally and then ship it off to Amazon. Right? So that’s the trade. One is going to be much more efficient than the other one. And knowing what those-

Jon: Yeah, for anyone living here in Eagle, Colorado it’s always going to be more efficient to do Snowball because our internet is horrible.

Chris: DSL.

Jon: Yeah. Okay.

Chris: And then another example of trade-off would be just using read replicas with RDS. And we’ve talked about this in the past as well, how you can just add read replicas to your database and that now allows you to optimize on your read capacity versus writes. And most applications do follow that pattern where they’re more read intensive than they are write intensive.

Jon: Mm-hmm (affirmative). Or they can be split into parts, right? Like one that’s really good. Like certain users need really efficient writing and other larger typically sets of users need really efficient reading. But if you think about who needs what, you can architect around that.

Chris: Yep, yep. And again, I think that’s maybe why they call this trade-offs is just kind of understanding that. Because a lot of times you will be … You’re trading off performance for cost, right? And it could be more expensive in order to get better performance. So that’s the trade-off there. Or sometimes you’re trading off CPU for memory or vice versa.

Jon: Yep, yep.

Chris: And then as far as best practices go, we kind of already covered this. So an example in the selection focus one is just making sure you’re choosing those appropriate resource types. Whether it be compute, storage, database, networking. Understand all the families, understand all the options there, the different types. Understanding what it means to … What’s enhanced networking? What’s an EBS optimized instance? Things like S3 transfer acceleration, VPC endpoints.
I mean, all of these come into play. And so really understanding all those options and whether or not it makes sense for you and your workload is something you need to be on top of.

Jon: And when we go back to the well-architected framework in general and how well-balanced it is for companies that have smaller workloads or startup type companies versus big behemoth companies. And I think it’s worth saying here that the trade-off around getting this right for a smaller workload is it’s like you’re not going to hurt herself that bad by not getting it perfect. So go ahead and choose something.
It might not be the best instance type if you need an ECT or it might not be the best IOPS selection if you’re doing an EBS. But then after you … Instead of getting wrapped around the axle and, “Oh my God, there’s too many choices here and I’ve got to go learn AWS in its entirety before I can even start and do stuff.” Like, that would be bad. Right?
So for smaller workloads or if you’re just getting started, you can make mistakes and they’re not that expensive, but at larger scales they can get very expensive and you definitely need to know this stuff.

Chris: Yeah. I mean, it goes back to that constant evolution and the feedback loops and just always you’re looking at your system, you’re measuring, you’re reviewing it, and then continuing to make improvements to it too.

Jon: Right. Yeah. If I were to have that slider that we talked about, it would be like this stuff would be kind of on the … If you’re on the lower end of this slider, it would be like, okay, this is less important. Like all of these perfect selections and perfect trade-offs and everything. It’s like, this is where you can … This is the part of the well-architected framework that you can dial back a little bit.

Chris: Mm-hmm (affirmative). Yeah. I think a lot of this stuff is pretty much more than … Again, things like implementing VPC end points so that you keep traffic within your own … Inside the Amazon network and within your VPC. It’s going to be faster, but it’s milliseconds faster perhaps. Right?

Jon: Yeah, exactly.

Chris: So you may not … It’s not going to make a big difference if you’re not … don’t have the tremendous amount of load or a tremendous amount of users to begin with. Your time and resources can be spent better in other places.

Jon: Exactly.

Chris: All right. Well, that is the performance efficiency pillar. So moving on, we can cover the last one. Cost optimization.

Jon: Cost optimization.

Chris: Which it’s so nice of Amazon, right? Because it’s like at the end of the day you think, “Hey, we want you to spend more money. We don’t want to help you save money.”

Jon: Well, it depends on who you’re optimizing for.

Chris: Yeah.

Jon: Optimizing for a AWS. Spend, spend, spend.

Chris: Yep. But I mean, all jesting aside, they have a very real reason for wanting to save people money because that provides the impetus, the business reason for getting onto the Cloud to begin with, and to get off [crosstalk 00:31:58] onto the Cloud, and to keep them there. And to-

Jon: Yeah. And it turns out a lot of the ways to optimize cost have you getting deeper and deeper in, right?

Chris: Yep. So the tendrils get tighter around you and it becomes much more difficult to extricate yourself for sure.

Jon: Yeah. Like reserved instances. A couple of years you got a machine that’s yours.

Chris: Well, go with for years for best price, right?

Jon: Exactly

Chris: Indeed. So as far as kind of description of this pillar it’s the ability to run systems to deliver business value at the lowest price point. So pretty straight forward there, right? We just want to run our workloads at the lowest price possible while still meeting all of our requirements and making sure that they’re actually providing value.

Jon: Yep. Easy peasy.

Chris: So design principles, let’s see. Adopt a consumption model, which basically is just saying only pay for what you use. I mean, this is one of the great things with the Cloud and we’ve talked about there’s so many different ways to achieve this. Whether it be things like auto scaling or with managed server lists. But you don’t need to over-provision, you don’t need an under-provision. Right? Just only pay for what you use.
Another design principle, measure overall efficiency. So again, kind of a constant theme here. Like you can’t really do anything without data to back it. So make sure you’re measuring. From whatever facets you want to look at and be able to prove on, you need to be able to measure and have data points that inform kind of decisions for you to make. So make sure you’re doing that.
Another design principle, stop spending money on data center operations. So this is … It’s totally geared towards, am I on prem or am I in the Cloud? And so don’t spend your money paying people to rack machines, and to run cables, and to … Don’t spend money on cooling systems and HVAC and whatnot, let AWS do that.

Jon: Right.

Chris: Another core design principle is to analyze and attribute your expenditures and this really applies to larger organizations especially, where you can imagine if you’re a company with many different departments … Marketing, you have sales, you have engineering, you have human resources, so you may want to track costs on a per department basis and be able to analyze that.
So how do you do that inside the AWS framework? And so that’s something that is important from a cost optimization standpoint. Is just knowing where money’s being spent and how can you attribute that to those different business functions.

Jon: It’s really not hard to start doing that though and even a company as small as Kelsus can benefit from this. Like it’s kind of … As our AWS build grows, it’s kind of useful to know what of that build is us running experiments for our clients versus what of it is us doing our own internal stuff. Like our internal skunkworks and things.
And all we have to do to get there is just every time we spin up a new service, just tag it with the project that it’s on and then we can get that report. And if we don’t do that tagging it becomes harder.

Chris: Yeah, indeed.

Jon: Cool.

Chris: And then a final design principle is to use managed services to reduce your total cost of ownership. And we talked about this. It keeps coming up as well, right? Like this is a great benefit that you get from being in the Cloud is let AWS do that undifferentiated heavy lifting.
Don’t spend your money, your resources doing those kinds of things that just really aren’t … It’s not your forte, and it shouldn’t be, and it’s not going to distinguish you from your competitors. So focus on what’s your core competency and leverage AWS to do everything else for you.

Jon: Cool. Undifferentiated heavy lifting. Are we going to start calling that UHL and everyone that hasn’t listened to the first 80 episodes of Mobicast will have to know what UHL-

Chris: Can you imagine? I mean, there’s so many things we could have acronyms for. So we could just end up speaking in code the whole time. So we already kind of do that too with the various … I mean, thinking about RDS, TCO-

Jon: I think it’s hard not to. Yeah.

Chris: … TCO.

Jon: Yeah, exactly.

Chris: So we’ll try to find the right mix instead of just speaking in code. And so key service here for cost optimization pillar is AWS cost Explorer. And so that is a tool that AWS provides to allow you go and to see where you’re spending your money, right? And to be able to drill down in that. And if you couple that with cost allocation tags, then that makes it even more powerful.

Jon: Exactly.

Chris: So focus areas.

Jon: And that was what I was … I’m sorry, that was what I was just talking about. Like tag the stuff that you make, everything can be tagged. And I think it was … This is worth saying. I think it was like a couple of years of using AWS where I would always make something new and there’d be the part where you could tag it.
And I was like, “Oh yeah, that’s for hardcore people that have some sort of super automated big system.” And it wasn’t until I started doing A Cloud Guru courses, I was like, “Oh, that’s just for cost. That’s pretty handy. That’s useful.” Yeah, that’s what that’s-

Chris: Yeah. I mean, because there is a distinction between just tags and then cost allocation tags are a certain type of tag that feed into things like cost explorer to really integrate in with those expense reports to [inaudible 00:38:01].

Jon: Uh-oh. I may be learning …

Chris: Expense reports to see.

Jon: Uh oh, I may be learning something here. So, I thought it was just like if you go make an a new EC2, and you tag it, you can use that for billing purposes to figure out where your spend is going. Is that not true?

Chris: No, you need to … So, the tag will be there, but it’s not going to be necessarily integrated in with things like Cost Explorer. So, you just have to tell Cost Explorer, these are my … These are actually my cost allocation tags and define them.

Jon: Oh, so it is using those same tags. There’s not a separate secret tag thing that I didn’t know about. For a second, I was like, “Oh my God, I thought those tags that I put into the console were the ones, but you’re telling me that they’re not.” You’re just saying that the Cost Allocation Explorer needs to know about the tags you’re using. Is that what you’re saying?

Chris: You’d actually define them within as cost allocation tags. You can go the one way, but not the other. I don’t think you could say, “Oh, I’m just going to start, tag my EC2s with these particular tags, whatever I want to call these name value pairs, and then now have it show up in Cost Explorer.” I think you have to kind of first say, “Cost Explorer, here’s my cost allocation tags,” define them, and then now I can start using those tags on things in my system.

Jon: What an absolute bummer. I would imagine at least you would be able to say, okay, say I have a tag called project. At least say Cost Allocation Explorer, I want to have a project tag, and then maybe it wouldn’t be able to go back through all history and say, “Oh, this is what you spent on this project,” but it would be able to at least from here, from now on anything with this tag, start collecting data on it. I would hope it’d be at least that good and not just, can’t even use it. Doesn’t even see those little tags. That would be such a just … I’m a product person, and I just hate it when I hear product stuff like this. It just really bums me out about AWS. They build something nice, and then they just don’t take it all the way there sometimes. They just don’t finish it, it feels like.

Chris: Yeah. So, full disclosure, I have not had the opportunity to work with cost allocation tags in depth, so there’s still probably some homework here to be done on what exactly is the mechanics of how this works, but it is the case that cost allocation tags are something that is different than just a regular tag. There’s something actually to actually make the … Because, otherwise you could have an infinite number of tags.

Jon: No, I get it. You want to be able to tag stuff? Yeah, I get it.

Chris: So, what is it going to do? You don’t want it to bring in all that stuff.

Jon: Or, why not though? Why not just build it. So it’s like, “Oh, here’s all your tags that you could possibly have. Is there something, is there some sort of filter you want to apply to these?” That would be a cool product.

Chris: Kind of, although now you see hundreds of tags from just everyone that’s done stuff, and it becomes kind of hard, right? Some people-

Jon: Could be, could be.

Chris: Think about it, you could have departments, one called it marketing all spelled out, one did MKTG, someone else did [marcomm 00:41:26] Or something.

Jon: But, going the other way, it doesn’t really help you either, right? Say you decided to do marketing, and you spell it MKTG over in when you’re creating a new EC2 instance. It doesn’t give you … Imagine you’re using the console. The console is not like, “Here’s the cost allocation tags for this one. Do you want to use one of them?” You still have to know it, right? It doesn’t help you at all.

Chris: No, this is where things like CloudFormation comes into play, right?

Jon: Yeah. So, to me it just feels like the more we talk about it, the more I feel like this could be a well done, easy to use thing that kind of helps you along, and it’s not. It just simply isn’t. You’ve got to go figure out what your tags are going to be. You have to communicate them to the company. You have to make sure that you use them. You have to make sure they are built into your CloudFormation templates. I mean, come on. Who really has time for that? Nobody, right? Nobody.

Chris: Well, except for the people who … There’s people whose job is this only. You have a company-

Jon: Yeah, but those people, is there even such a thing as a tech CFO, right? CFOs are like, “Help me understand the say AWS bill.” Somebody is like, “Oh, we’ve got to tag this stuff. Tell us what the tag should be.” Can you just imagine how awful that conversation is? I mean, it is the worst. I’m sorry, but this is a shit show. It really is not okay. All right, all right, I’ll move on, but I was excited for how AWS was going to help you with cost optimization, and now I’m let down.

Chris: Well again, to be continued. It’s definitely something that can dive a little bit deeper in and maybe report back later. All right, so let’s see. Focus areas, so four focus areas here in the cost optimization pillar. So, expenditure awareness, which I think we’ve now beat to death with talking about tags and Cost Explorer, you could use AWS budgets to keep track of what you’re spending and to emit CloudWatch events when you approach thresholds. Second focus area is cost-effective resources. This is really kind of understanding things like with EC2, knowing when to use on demand versus spot versus reserved instances. And, a third focus area is matching supply and demand. So, we’ve talked about this before. This is the consumption model, only paying for what it is that you’re using.
So, getting that supply and demand lined up. So, things like auto scaling really kind of help here in this focus area. And, then the fourth one is optimizing over time. It’s a common theme again. You’re not going to have a static workload architecture. You always want to stay up to date. As new instance types of rolled out, you may find out that it’s going to cut your bill by 20% by going to a new instance type. So, keeping up to date on that stuff. And, so again, leveraging things like the AWS blog, the what’s new. There’s other tools and whatnot that will help you. I think one of the key points here, the key takeaways is that Trusted Advisor is a tool from AWS that’s going to be really helpful here. So, it’s going to show you. It’s going to help you find ways to save money. So, definitely leverage that tool. So, leverage Cost Explorer, leverage Trusted Advisor. They’re going to be really helpful in helping you do this cost optimization.

Jon: Excellent. All right, so we’ve been talking for about 45 minutes, but we really need to finish the Well-Architected Framework this week. So, if listeners could hang on for another 10 minutes or so, I think we can burn through the last piece here on the Well-Architected review and the Well-Architected tool.

Chris: We’ve talked about general design principles. We’ve talked about these five pillars and all the various focus areas, and it’s just a wealth of information, but in and of itself, it’s kind of like just giving you a bunch of textbooks and saying, “Here, go study, and good luck.” So, the other key part of this is, here is the list of questions you should be asking yourself, the standard list of questions that will make sure that you’re addressing all five of these pillars and the focus areas. So, that’s the Well-Architected review, right? It’s centered around the question, are you Well-Architected? It’s providing you a consistent approach to reviewing your workload against the current best practices. It’s also been giving you, it’s given you tips and advice on how you can improve and become more aligned with those best practices.
So, this Well-Architected review, it is a series of questions. Each one of these pillars has anywhere between … It’s on the order of 10, 15 questions in each one of the pillars. Depending on your answers, it will then give rise to, here are some of the things that you should be doing to address that particular question if you weren’t satisfying it. So, benefits of the review are, you’re going to lower or mitigate your risk. You’re going to be better at making informed decisions. It’s actually just a really great way of learning best practices, because it really is, by doing the Well-Architected review, actually, you’re walking through the Well-Architected Framework and all five of the pillars. And, what are the best practices in that? So, it’s really just a wealth of information. There’s just so much good, good information there. So, highly recommend that you go and check it out and actually do a review on one of your workloads.

Jon: Yeah, and that’s what I was saying or thinking is that there has got to be somebody listening out there that’s got a SaaS product that’s kind of in its infancy, doesn’t have that many users yet. It’s been fine. It’s up, not too buggy, seems like it’s ready to go. They’re just trying to get users on it. Take that, run it through the Well-Architected review, and look at your red flags. I think that you’ll be able to prioritize some work to make sure that you’re ready to scale and be reliable and not to spend too much on infrastructure.

Chris: Yeah. I really think it’s very, very useful, and it’s very much time well spent, if for nothing else other than opening your eyes to, oh, you know what? I didn’t think about that. Or, oh, that’s an interesting approach for how to handle just alerts and this whole concept of things like runbooks and playbooks and kind of automating some things and whatnot. So, really, really, really good information. In a way, it’s a lot more practical way of digesting all the information that is in the Well-Architected Framework, is just to go do the review, and then pick these things and kind of one at a time, address them.

Jon: Yeah, and engineering managers could do this, and then have a backlog of stuff ready to go for say you’re onboarding new people or you have a little bit of downtime between features. Instead of being like, “Oh, we have downtime. What are we going to work on?” You’ve got all this stuff ready to go in your backlog.

Chris: Yeah, indeed. So, that’s the Well-Architected review. Let’s wrap up with, there’s a new tool that AWS announced at re:Invent 2018. It’s now available in the AWS console, and it’s the AWS Well-Architected tool. So, what this is, this is just a piece of software running in the AWS cloud that is implementing the Well-Architected review for you and allowing you to answer these questions, keep track of what your answers are, show you your recommended action items, and then allowing you to create read-only snapshots to basically create milestones of your workload as it evolves through its lifecycle.

Jon: Oh, cool.

Chris: So, you can see the progression, right? So, you can go and do your baseline review. Save that as the baseline snapshot. Then, maybe a few months later, you make some major changes to your workload. You go and redo the review, and you can see how you have all … Now, you make a new snapshot of that milestone, call it version 1.1. So, you can now just see that evolution, and everything is all there inside the tool itself, and so other people can see it. So, just a really handy tool for performing these reviews and making sure that you have a consistent process for doing the review, and then for capturing results and measuring it.

Jon: Where that would feel, where it would be really useful is, say you’re going into a new company, and you’re a CTO, a new CTO for a company that has been having problems, and you’re kind of in there to rescue the company from falling over on the engineering side. This could be a great way to start communicating with the CEO. Here’s how I’m going to save you. I’m going to do this Well-Architected review, and then I’m going to start reporting to you once a month or once every two months how our progress is going against this. It seems like a really great metric for a CEO, CTO style type relationship.

Chris: Yeah, absolutely. Totally agree. So, again, in preparing for this series, I definitely spent quite a bit of time looking at the Well-Architected Framework. In the past, I’ll fully admit I hadn’t spent too much time with it other than just kind of understanding, know what it is at a high level. It’s a lot of material. The white papers alone, there’s six white papers. There’s the general one, and then there’s drill downs for each one of the pillars. So, it’s 300, 400 pages, or 300 plus pages of white papers alone. That’s not even addressing the questions, and then all the actions that are associated with it. You have the Well-Architected tool, and it’s just, there’s a lot there, but walking away from this, I’m really impressed with it. This is really good, useful information, and this should be top of mind for anyone working with AWS. Really look at this, leverage it. Even if you just take a few bullet points from it, it’s going to be well worth it.

Jon: And, here we brought it to you in three hours of Mobycast. Hopefully this is enough to kind of get started, enough. The last three hours of conversation should be enough to at least have everybody who listened kind of know some new things, have some questions that they can go look at, and maybe start to change the way they approach thinking about their systems within AWS.

Chris: Yeah, and I’d say just as a really quick way to get started is just go log into the console. Go to the Well-Architected tool, and do a review. You can even just-

Jon: Exactly, and that doesn’t take long, right?

Chris: I mean, it’s a half hour, 45 minutes, but just do that. It could be against an existing workload, or it could be kind of what you typically do with a workload. What are the areas that you usually do address versus don’t? And, just by doing that, it’ll just light up these areas, and you’ll see the action items and the recommendations for each one of these questions. It will go lead to additional links and reading and just a lot of really good, useful information. It will lead to a bunch of a-ha moments, and you’ll be like, “You know what? We should do that.” And, I can see how we can start doing that, and it’s not too difficult to implement, and it’s going to really help us out a lot.

Jon: Well, cool. I love it. All right, well thanks a lot, Chris. This was super informative for me too, and there’s a couple of projects at Kelsus that are going to be using the Jon Christensen tool in the next few weeks here. I’m sure of it.

Chris: Awesome.

Jon: Thank you.

Chris: Thank you. All right.

Jon: Talk to you next week.

Voiceover: Nobody listens to podcast outros. Why are you still here? Oh, that’s right, it’s the outro song. Come talk to us at Mobycast.fm or on Reddit at r/Mobycast.

Sponsor

Summary

Show Details

Links

Whitepapers

End Song: