Rebranding DevOps as Cloud Engineering
In this episode of Kongcast, Matt Stratton, a staff developer advocate at Pulumi, explains the history of configuration automation, the world of cloud engineering and how it compares to DevOps.
Check out the transcript and video from our conversation below, and be sure to subscribe to get email alerts for the latest new episodes.
Viktor: So before we jump to this one, tell us a bit about yourself.
Matt: I spent about two decades working in traditional technology operations. I was a sysadmin. And I live here in Chicago, so I worked for a lot of financial institutions and insurance companies because that’s usually where you work in Chicago if you work in tech.
Along the way, I got interested in this DevOps stuff. It was probably about eight or nine years ago. And got interested in how we could automate things. I was always about automation, so I was writing a lot of like VB scripts and a lot of bash scripts and a lot of this stuff. And then I learned about this thing called Puppet and started using Puppet for some things, and later switched over to using Chef. That’s a long roundabout way to say I’ve been involved in DevOps for quite some time. And I went from being a member of the Chef community to working at Chef. And Chef is an infrastructure as code tool where you write your code - your infra code - in Ruby. I stepped away from that and joined Pulumi.
And so infrastructure as code is really interesting to me, and I think there’s a lot of power to it. I like to see how things have evolved. But at the end of the day, doing this work was not great. I’m excited that this changes how people work. I worked at Chef because I believed in Chef. It’s not the other way around. I think that’s still true for what I do now.
DevOps Tool Challenges
Viktor: You said you were working with Chef and Ruby to solve your DevOps problems. Many system administrators try to make their lives easier, but you can make life easier for others too. What are the challenges you saw with tools like Puppet, Chef and Pulumi?
Matt: I’m wearing my Pulumi shirt right now. But still, I used to say when I was at Chef when people would say, “Should I use Chef or Puppet?” I would say “Yes.” Like, so do something. I’m a big believer that the biggest competitor in this space is not another vendor. It’s do nothing incorporated. It’s still doing things by hand. I’m not trying to sell you Pulumi as the be-all and end-all that might be the thing for you. I happen to think it’s a really good way to solve this problem.
Some of those problems might not exist as much today for everybody, but when you’re talking about scale, we can’t do this stuff by hand. If we’re writing our configuration and then applying that configuration, things change. We might have different configurations in a pre-production environment than in production, and we want to be repeatable.
People who talk about automation like to say we do this because we’re lazy, which is true. I don’t like to do the same thing repeatedly unless it’s fun. And most of this is not fun, but creating its implementation might be fun.
If I want to iterate over things where I’m saying, "OK, I need to go create a bunch of these things, and maybe there are dependencies on them." I need to create a VPC that my cluster will live in. I want to do that and then reference that I created it in the implementation that happens later, rather than going and creating them piecemeal.
That’s when we start to think about why we need to use a programming language because our infrastructure is code; it's software; it’s components; it’s pieces that have dependencies on each other. And one of the things that I think has been interesting in the years of going back with CF engine, and we’re always building on the shoulders of giants.
We keep talking about infrastructure as code, but really what we’ve been doing is building our infrastructure using some code.
We don’t think about our infrastructure as reusable components that are abstractions that we can treat like software, which I think is the exciting thing that we’re moving towards today. The thing that gets your job done is the right thing for you. And the decision you made was the right decision to make at the time. But things are continually evolving.
Sysadmins Are Coders Now Too?
Viktor: Since you mentioned you started your life as a system administrator, and historically, culturally, system administrators and developers are different breeds. Like you write software not because you like writing software as a developer, but because you have to automate your task because you’re lazy.
And now we came into the world where you kind of as a system administrator, you have to program stuff, and we have a lot of tools pushing all this. Site reliability engineers and, I don’t want to say the DevOps engineer because it will be a pun—everybody understands that. And like how did we end up here? Why make people who generally don’t like code to program?
Matt: You couldn’t have teed me up better. I’ve been saying this for a long time. To my friends in the sysadmin world, especially when we start bringing in Chef and similar tools, that’s the concern.
They’re like, “I’m not a coder or a software engineer. Why are you going to make me code?”
I’m like, “I got great news for you, sysadmin pals. You already do this. We just use different words for it. We don’t talk about programming, but we talk about scripting. And as much as sysadmins might like being in the console, we all write scripts all the time, and we do it in many ways. That’s coding. I got news for you, Bash. That’s code, man. And you’re doing it so cool. And we don’t call them code reviews. But you do that maybe through change control. We do these processes. We don’t call it debugging. We call it troubleshooting. That’s what I’m trying to tell you. You already know how to do this. And not only do you know how to do it, but it’s also part of what you love about your job. We just call it something different. Yes, there’s still more to learn, but conceptually you’re already doing it, right?”
Viktor: That’s a great point. I came into programming by writing shell scripts to automate some work. And for some reason, even though I was using a real programming language, I was using Perl to write some log parsing cron job that would collect the logs from the different computers. And after that, slice and dice to go and look what happened in a particular day or hour. I used my programmer skills because I wanted to use a module structure and apply some of the concepts I learned from object-oriented programming.
Managing Technology Changes
This brings us to the point I wanted to discuss. How are things changing from the start? Like people starting with bash scripts in MakeFiles or some of the concepts introduced in the programming language, like a module, the code reviews unit tests penetrate the world of infrastructure management. You’ve been in this world for a while. How do you see this transition, and how are people accepting this?
Matt: I think the biggest pushback for any change is rarely to do with it being a new skill. People do like to learn new skills. I’m not saying everybody wants to learn something new and has to learn something new every week when your job is continually changing. But it is. So much of change comes back to a sunk cost fallacy.
I see this a lot with TerraForm, which does wonderful things. You can be an organization that has a large investment in TerraForm, and this could be true of any tool that we’re talking about.
You might be all excited about a new thing, but you’re like, “Oh my God, you know how much work to get to that?”
And it’s funny because whatever was the thing that is, “oh, it’s too much work to move off of today,” was the thing that people didn’t want to move to six years ago because it was too much work to move off the other thing.
Viktor: Right.
Matt: So I’m not saying you should throw things away. Never change for the sake of changing. We want to do that as engineers.
We like fun new things, and it’s always fun to learn new things and continually refactor and polish the thing because your legacy code was the new hotness five years ago.
Viktor: Exactly.
Matt: Keep that in mind because it’s hard to remember. What does a new thing do that you can’t do now?
With Chef, I would often work with organizations using, for example, System Center, like in a Microsoft shop. They’re trying to make a CCM, which was fundamentally a thing for configuring desktops to configure their data center. And maybe they already had an investment in it, and then they wanted to move to Chef. But what’s common to do is take all the things your old tool did and re-express that with the new tool. So you’re going to spend a whole lot of time doing that. And at the end of the day, nothing changed.
So I always sat there and said, “What’s the gap? What’s the thing you can’t do with config man right now that you can do with Chef? Do that part first. So at least while you’re doing the work, you’re getting some new value.”
Tech is easy; people are hard. The hard stuff is how we change organizationally: How does my day-to-day change? How does my communication change with this?
So if you’re going to do all that work, you should be getting something out of it other than you’re paying somebody different now than you were yesterday. If you move to this new platform at the end of the day, and you’re doing all the same things you used to do, but now it’s someone else, what did you get out of it?
Viktor: That’s what I saw multiple times. People use a new tool because it’s a new, shiny thing, but they keep repeating the same pattern. And in this new tool, using this type of approach that you did five years ago with this tool might not be feasible or not possible. So that’s why people also might turn off and say, “no, it’s kind of not for me.” Because they are approaching this new tool from experience that they have previously and want to translate the skills they have previously.
Matt: This reminded me of something I used to see. I feel a little more comfortable telling a story with Chef and Puppet because I’m not there anymore, but it’s true across the board. I would go to a Meetup or something, and someone would be giving a talk about how “we threw out all of our Chef code and replaced it with Ansible, and it was so much better.” Or they would do the same thing, “we threw out all of our Puppet code and replaced it with Chef and are so much better.” And it was always because obviously, the tool is so much better.
And I was like, "No, you’re smarter because when you took your existing implementation, and you redid it, you did it with more wisdom than you had the first time. It has nothing to do with the tool. You’re better, and you’re smarter now.
It’s a lot easier to sell a migration project to migrate from this thing to this than to do a let’s take the next year, and we’re just going to refactor all of our stuff.
Good luck if you can sell that, but you should. But so that’s a much harder sell.
What's Cloud Engineering?
Viktor: You keep mentioning some of the vocabularies that are very familiar to many developers. And developers also like to call themselves software engineers now. And I’m trying to bring us to the topic of cloud engineering and what does it mean in general? Is it something that when you write your YAML files and send them to your Kubernetes cluster, does it make you a cloud engineer?
Matt: When I think about cloud engineering, it’s a discipline encompassing how we build, deploy and manage platforms, applications and services. It encompasses people writing the application code, operational folks, SREs, infosec, compliance and security folks. And the idea is that we have one way to do it, and it’s not one way for the entire industry. It means it’s one way within your group to build, deploy and manage. Because the more that we get ourselves to a common vocabulary, that’s what breeds empathy, and empathy is the core of this. And if a lot of this sounds like DevOps, it’s because I think it is. At least with DevOps was supposed to be.
Cloud Engineering vs. DevOps
Somewhere along the way, DevOps became automation tools and writing automation, but that’s just part of it. It’s about taking these principles like what can all these different groups learn from each other and then come together in a cross-functional way? That’s the core of it to me, and the cloud makes this a little easier because everything’s API-driven and exposed. We’re not shoving it as hard as we had to before.
Viktor: At some point, explaining DevOps is a cultural trait and how you communicate between people. And very quickly, we start talking about DevOps as tooling. The point you made a little bit earlier about tech is easy; people are hard. That allows us to steer from the actual problems that DevOps is trying to solve as a culture aspect: how to communicate between teams, establish this communication, and properly plan and share the work. That’s brought us to the point where when you ask for someone, and someone will tell you, “Uh, yeah, DevOps, we’re doing DevOps tools,” but what you’re doing is using automation tools or provisioning tools or some other things, right?
Matt: Tools influence culture, culture influences tools. Sometimes people feel like I’m anti-tool or anti-automation when I rant about things like this. It feels like we overcorrect on culture and communication and human factors because those are the things that we have to tell you to do.
Because playing with tools, we’ll do that automatically. We like to do that. You don’t have to sell a bunch of engineers on automation. We all want to do that. But it’s really easy for the gravity of that to become the default. We gravitate to that. So we have to push harder to get escape velocity from tool pull to culture pull, which is why we talk about it a lot. But it kind of happened.
You can’t buy DevOps, but I can sell it to you. Let’s call it something else that’ll help. We’ll rebrand. We’re good. It’s cloud engineering.
Viktor: I like it. I like how it sounds.
DevOps Evolution/Revolution
So let’s talk a bit about the value prop of the tools and how it changes and why we are still, even like 5-6 years into this DevOps evolution/revolution…I’d say it is a revolution because it significantly changes people’s minds. And how has this value prop changed over time? Starting from simple automation to the point where we have real programming languages or real SDKs.
Matt: I think cloud-first helps drive that. If we go back to pre-cloud or even if it’s cloud, but more IaC, VMs or stuff like that where you’re talking at an operating system level from an operations standpoint. Having a common language, not programming, but the lingo or frame of reference between software engineering and operations, you’re forcing that because you literally are thinking about the world differently.
I found this interesting about why a lot of early automation tools were a lot better on, say, Linux than on Windows. And Jeff Stover, who’s a distinguished engineer at Microsoft, he’s doing a bunch of great stuff, like he invented PowerShell. He said part of the reason is that Linux thinks about operating everything in a document. Whereas in Windows, and again, I understand I’m putting myself back, everything’s changed a lot. But at the time, everything was an API. So if you had tools like Chef or Puppet that wanted to reason about files, they were tough because that’s not how the operating system worked.
That being said, if we think about the modern cloud and Kubernetes, they’re all APIs. That means we reason about this infrastructure in many ways that are very similar to how we build services that our applications run on. So it’s a lot easier.
Back in my Chef days, my customers would always ask me, “How come I can’t just point Chef at my Apache server and have it spit out a cookbook to build that?”
And I was like, “because on that Linux operating system and Apache, there are hundreds of thousands of possible things, and Chef has no idea which of those you care about, what’s relevant, what’s just sort of there, blah blah blah.”
Do you know what you can do with Pulumi, for example? You can absolutely point it at your AWS infrastructure and have it spit out Pulumi code to make that because it’s a defined API. That the settings are understood what they are. That’s just the world of cloud. That’s API-first versus document-first. It lets us reason about things that way.
I think that’s where that interesting evolution has come, and we need to embrace that a little bit more rather than a lot of our stuff we think about YAML files that are still thinking about everything is a document.
And that’s a piece along the way. But it’s not how you reason about the overall infrastructure.
Viktor: It’s a very good point. I did a workshop at the London DevOps UK, and one of my co-speakers presented Kubernetes from the perspective of some content management system. He talked about uploading a document to the system, and the document would be materialized in a piece of infrastructure.”
That’s a very interesting point because when I explain this, I like to approach this from the perspective of an API and say, "this document essentially would be translated to API call. We take this YAML file, it will be translated to JSON, and after that, it will be submitted to Kubernetes API."
I agree with you in terms of it’s your mindset. Like describing infrastructure and how you would describe infrastructure from the perspective of state—that’s how the Kubernetes teach us to do things right, so we describe the desired state, and after that, Kubernetes will make things happen. Versus how we have the playbooks where we describe the steps that the tool needs to take to take us somewhere. In one case, it’s about the journey; in the other, it’s all about results.
I think we have a perfect segue to start talking about Pulumi.
Demo: Configuring Infrastructure With Pulumi
Thanks for Joining Us!
I hope you'll join us again on March 7 for our next Kongcast episode with Tim Hinrichs from Styra.
Until then, be sure to subscribe to Kongcast to get episodes sent to your inbox (and a chance to win cool SWAG)!