DevOps Chat: DevSecOps and Linux Protection, With Capsule8
Capsule8 is focused on protecting Linux infrastructure whether in the cloud, in containers or even bare metal. The team is made up of industry veterans who understand the problems security pros face, as well as the frustrations of developers, DevOps and sys admins deal with every day. The company’s approach is more of a shift-right DevSecOps, focusing on detecting attacks, threats and vulnerabilities on production infrastructure.
In this DevOps Chat, I spoke with co-founder and chief architect of Capsule8, Pete Markowsky. Have a listen and hopefully learn.
As usual, the streaming audio is immediately below, followed by the transcript of our conversation.
Transcript
Alan Shimel: Hey, everyone, it’s Alan Shimel, DevOps.com, Security Boulevard, Container Journal, and you’re listening to another DevOps Chat.
I have what I think is gonna be a great chat lined up for us today. Straight from Brooklyn, New York, we have Peter Markowsky, Co-founder, Chief Architect at Capsule8. Pete, welcome.
Pete Markowsky: Thanks for having me.
Shimel: Thanks for being our guest. So, Pete, let’s get kinda the preliminaries right out of the way here, right?
Markowsky: Mm-hmm.
Shimel: Capsule8—what do you guys do?
Markowsky: Alright, so, since Marketing will kill me if I don’t say the first statement, we are a high performance attack protection system for Linux servers.
Shimel: That’s great. Alright, let’s [Cross talk]—the Marketing people are happy now.
Markowsky: The Marketing people are happy, right.
Shimel: So, talk to my audience.
Markowsky: So, what do we really do—what do we really do, right? So, we do a lot of behavioral detection, you know, the idea is to do it as close to, shall we say, line speed as they used to say for the old firewall days, so that you can actually take an agent, put it onto your servers.
We know how to instrument the Linux kernel appropriately so that we can be container aware, we can be behavior aware, and for us to be able to rapidly provide context and alerting capabilities, so you can put policy down, you can rapidly figure out facts and then articles to get this data, either alerting data that we push out to somewhere, you know, so you can get it into your workflow, you know, however and as fast as we possibly can.
The other thing we do is, we store up a bunch of facts as we’re going along and analyzing the data, so you can say, “Hey, wait, you know, at some point, I found a piece of malware, somebody’s crypto-mining on my servers. How did we get here?” Right? And you say, “Well, what network connections did my servers make, outbound, in the last, say, hour?” Right? We want to be able to facilitate answering those kind of questions that security operations folk have to do, and we want to be able to get it into your workflows as fast as possible, but also at the same time, we wanna keep Ops and everybody else happy but not chewing up all your resources to do this, right?
Like, our sort of stake in the ground compared to a lot of other people in the space is, you know, we kinda recognize that these are your bread and butter systems and, you know, we can’t spend a ton of time shipping all the data off the host at all times. We can’t—you know, the analysis has to be very focused, you know?
Because again, we’re sort of attacks, right? Security is that property you measure, you know, when you lose it, right? Otherwise, it just sort of goes by unnoticed. The other thing is that everybody has a sliding scale for their risk tolerance, you know? Like, maybe they’d be willing to accept some loss in fidelity, provided that you keep performance within the right range, right? We need to be able to make sure that we can accommodate people, because that’s what makes production different, right?
We’re not talking about desktops that have large amounts of CPU just sort of sitting idle where, you know, we’re talking about a lot of attacks that start with things like command injection, right, and then you sort of run roughshod from that initial vantage point.
And so, you know, we walk through that kind of stuff and because we’re able to do things like instrument the kernel through things like perf, we’re able to pick up things like kernel exploits, right? The goal is to find that nice sweet spot where somebody’s gotten in through one of the various means, be it a command injection, be it stolen credentials, be it, you know, a web application bug that they’ve sort of gotten in there, and now we wanna be able to sort of follow them up to the point of privilege escalation and other sort of user policy violations at that point.
And we wanna be able to help you figure out, you know, how that happened, which also means providing a lot of context, right? Because one of the amazing parts about the brave new world that is, you know, cloud containers and everything else is that we have all this rich, rich context, right? We’ve got metadata that is just coming from everywhere, right?
Shimel: Mm-hmm.
Markowsky: And one of the things that security operations folks often have to do is, we have to now take that context and put it together so we know who are the operations teams to contact in case of an incident. You know, how much do we care, right? And, you know, the fact that we have things in sort of these software defined groups, right?
When you’re in the cloud and everything’s driven off an API, you know, I may have a host that we had an incident, but the question is, do we care? Is that tied to one of my main line services? Is that tied to an auto scale group or actually that node got blown away already just because—hey, we didn’t have as many requests per second as we wanted any more, and the auto scale group just wiped it out, right?
Shimel: Yeah.
Markowsky: Right? And a lot of times, like, a lot of the modern security tools we’ve seen and a lot of the traditional ones really don’t really give you that context, nor do they really give you an easy way to get that context to you.
And so, you know, 10, 12, 13 years ago, right, I was working security operations at Google, and it was one of the first times I carried a pager. And, you know, you get that page in the middle of the night and you’re like, “Oh, my God, please tell me Google.com is not down,” right? You know, because of course it’s the website everyone will see.
And, as a result, you know, the thing was context, right? The first thing you get off the page is—okay, we saw a thing, what does this mean, right? How do I make the decision to sort of say, “How do we act as quickly as possible?” And, you know, some of it, you build up with experience, but some of it—right, if you could just put the right piece of information in front of the person, they can more effectively get through that, find the right people to bring in.
And, you know, I mean, a lot of incidents, unfortunately, are still measured in, what is it, you know, months to years, in some cases, or at least weeks to years?
Shimel: Yeah.
Markowsky: Sorry—weeks to months. But yeah, and so, you know, our goal is to try to be able to say, “Look, there’s this whole new, rich world of data. Let’s merge it in for you. Let’s enrich it up front, on the host, be able to use it to make decisions, right, whether it’s about, like, do you set policies? Do you—you know, not only do you set policies, but can you put that information into alerting data, right?” And, you know, being able to use that context to help operators make a better decision, you know, as well as we have some of the traditional security pedigree.
One of the things I guess I should talk about here is, when we formed Capsule8, a bunch of us had been working on sort of, you know, the more offensive side of security research, right? You know, which is, like—find bugs, how do you exploit bugs? And, you know, a lot of the stuff we noticed was, you know, there’s patterns to these attacks. And so, we said, “Okay, can we build faster analysis to sort of find the choke points in these patterns?”
Shimel: Mm-hmm.
Markowsky: And, you know, and we spent a lot of time working around that and, you know, the good news is, it pays off. I mean, there’s things where, you know, when you do something—like, for example, kernel based privilege escalation, right? Your goal is to escalate privileges. There’s a million ways to do it, but ultimately, you end up, you know, modifying a process somewhere and giving it more access than it had before, right?
Shimel: Yeah.
Markowsky: Right? And so, you know, rather than trying to sort of enumerate all the badness, like, let’s talk about behaviors. Let’s talk about the goals and what are the things you want to steal? And again, going back to our sort of sense of context, right, you know, that can do it.
And so, one of the other things that we sort of, you know, do is, we spend a lot of time working on the idea of being able to provide context to our—you know, what we call strategies, which are our detection logic, so they can make decisions based on things like, “This is what processes are running on the system and this is their parents and this is where they came from.” And so, that sort of sense of providence lets us make better decisions, give you the ability to more aptly describe common situations, right?
So, like, take something like files, right? Like, you know, file policies. You wanna be able to say things like, “Apt should be able to create files, right, if you’re using a Debian based system.” You know, an Apt get install from an administrator should not create a ton of alerts.
Shimel: Mm-hmm.
Markowsky: Right? And, you know, anything that has installation scripts and other things may kick off subprocesses in other stuff, right? Like, the thing that we talk about with our context is the ability to say, “Look, you know, our strategies can see that this job is rooted through Apt and is coming through here” so you can say, “Okay, we trust Apt and its children at this point to be able to make these right decisions or to do these behaviors, really.”
You know, and so, we start with that, right? We start with context. We start with, you know, the ability to apply policy through this context. And then the other big thing that I’ll definitely, I would talk about her philosophically is integration. And the reason we feel integration is incredibly important is, there’s this sort of sad pattern that shows up in a lot of security incidents.
Like, if you look at a bunch of the reports that the House of Representatives has people do, like, for the Target breach or Equifax, right? There’s this sort of, you know, what happened and they give you the timeline of what happened. And occasionally—and I’ll cite Equifax here just because it’s maybe the most interesting—they didn’t notice the attack for 78 days, because the device they had monitoring the network traffic had an expired security, like, SSL cert, right?
Shimel: Yeah.
Markowsky: And so, as soon as they replace the cert, suddenly they notice that, “Oh, my God, people are copying data out of our network.” And this sort of goes to the heart of what we think about with integration, because the whole point is, if we can get our—you know, our alerting data, our investigations data into your workflows and into your system, right, without having you have to sort of think about it, right, you’re more likely to find the relevant piece of information, right? And if it has the right context, then it kinda nails it home, right?
Shimel: Yep.
Markowsky: And so, you know, we’ve been—lately, we’ve been on a bit of a tear, right? We’ve been going around and integrating. I know people have been talking to us about how, you know, “Can you integrate with AWS’s Security Hub or Google’s Cloud, you know, Security Command Center?” And we did that recently.
Shimel: Yep. So, Pete, I wanna talk about that, but before we do, I don’t know, I just feel compelled to interject something here.
Markowsky: Yes.
Shimel: You know, what I’d like to kind throw out at you is, you know, I view cloud security—look, I’ve been in the security industry for a really long time, you know, I was looking at the leaders over at Capsule8 and, you know, Dino and John and so many of these guys I’ve run into and worked with or know from the industry.
You know, when cloud first—cloud and cloud security first kinda came to the forefront in 2005, right, a lot of the so-called cloud solutions are what I call cloudwashed.
Markowsky: Mm-hmm.
Shimel: They were your regular on prem stuff, service stuff that people just said, “Poof, now you’re cloud,” right?
Markowsky: Mm-hmm.
Shimel: Then we saw sort of this first generation of cloud security solutions, but I would say they weren’t cloud native, they were still server native, data center native, maybe optimized for cloud is a good way of thinking about it. Because we really hadn’t quite figured out, what’s the proper mix between what is the cloud provider supposed to do around security and what do I as, not the consumer, but I’m hosting my stuff on the cloud, I’m using the cloud for my infrastructure, right? What is my responsibility regarding security, understanding that whoever’s responsibility it is, ultimately, I’m still gonna take the blame when stuff hits the fan.
So, we saw that first generation where there was this dance around that. Then, I think we saw a third generation of cloud security, Pete, and this was really—I think we started seeing the first Cloud Native, Cloud Security kind of stuff, right, that was born, designed, hosted, raised, lived in the cloud. And, at the same time—and this is sort of the double helix in my mind, Pete—at the same time, it also had a little bit of a keener understanding between what is the cloud provider’s responsibility or what are they doing around security versus what are we doing around security and how do they interact?
Now, I think the kind of stuff you’re talking about, right, is yet a new generation where—yes, it’s Cloud Native when it has to be and is, and not only is it Cloud Native protecting the cloud, it’s Cloud Native protecting your on prem and network as well.
But, even on top of that, the—more than the understanding of what the cloud provider does and what I’m gonna do, there’s truly an integration of what the cloud provider’s providing in terms of security. So, when you talk about integrating with things like AWS and I forget the name now of their cloud.
Markowsky: Security Hub.
Shimel: Mm-hmm, or the new Google console—you know, to me, this, I mean, do you think I’m off base here, or is does kinda jibe with how you look at it?
Markowsky: I mean, I think that that kinda syncs up with how we look at it as well. You know, I mean, I think there’s been a lot of traditional stuff, right, that’s been out here—Sumo Logic, at an older level, like, QRadar, ArcSight—you know, all these sort of SIM kind of tools.
Shimel: Yeah.
Markowsky: But they’ve also been sort of a step removed from the infrastructure, right as well.
Shimel: Yep.
Markowsky: And so, you know, we’re seeing all this sort of power open up, right? Like, the idea that you have lambdas, you have cloud functions and everything else that’s also tied in with all of this that you can sort of pull in in different ways and sort of set up your workflows so that, not only do you get the beauty of sort of the platform that the providers have set up, but you also have the ability to really contextualize what’s happening and all these different layers.
And, as we sort of talked about earlier, the nature of the game has changed, because everything is moving, right? Because auto scale groups, between things like pods and Kubernetes, right, your workload and your services may bounce across the network at a software level, right? Your security—you know, your security operators now need to be able to quickly contextualize what it means, right? In the past, we had people tying stuff down to, “Oh, well, this IP and this server is always this thing.” And, you know, the thing that I guess that I see with the providers is the fact that they can seamlessly pull in all of this context and you can sort of be yet another feed into that, right? You can enrich the operator’s experience. They can go, “Oh, yeah, I get it. I see what’s going on here. Oh,” right? And more rapidly make that decision, but also, you can kind of, you know, pick what you wanna do about it in a bunch of different ways that previously you may have just been saying, “Oh, well, I can put a firewall rule, let’s go do this” or whatever else, right?
The idea that you can just snapshot any host, pull it down, keep stuff in service and, you know, go do some forensics on it later—I mean, it’s pretty exciting.
Shimel: No, look, it’s a great time to be alive, right?
Markowsky: Yeah.
Shimel: Compared to, you’re talking to someone who, you know, my last company I co-founded back in 2001, we took Snort and started using IPS functionality with it and you wanna talk about desensitizing and overload, you know, and line speed, when you said line speed, I had a terrible nightmare flashback.
Markowsky:[Laughter]
Shimel: And so, it is a great time, but as much as it’s a great time, Pete, it’s a dangerous time, because the threats today are faster, better, smarter than they were back then, too. So, it’s like—it’s like this giant petri dish of Darwinian evolution, right? And I don’t know what the exit out of that is, but it’s certainly, you know, it’s a challenge. As exciting as it is, it’s equally as challenging.
Markowsky: Yeah, I mean, I would agree. I would say the interesting thing is, we’re starting to see the fact that, as change management starts to become more and more mature, right, that one of the things that people aren’t necessarily taking advantage of, but they can start to is home field advantage, right? While these threats get better and better, they still have to go through the process of break in, learn the environment, find the data they wanna steal, right?
Another company that, I have to admit, I admire considerably is Thanks for the Canaries, and the Canary tokens. Because they basically brought back honey pots and honey tokens, right?
Shimel: Yeah.
Markowsky: Bringing those back where you can, you know, put those in your environment where your people know what—they know this is the bad file, you shouldn’t touch the bad file, right? And if they do, you still wanna investigate it, right?
That kinda stuff is just sort of the beginning of this, right? If everything’s tied to an API, right, you can mix things up periodically to keep people off balance and even the more sophisticated threats are still gonna have to figure out how to migrate around your network and how to sort of deal with change, right?
An amusing sort of thing that we used to talk about, especially in the early days at Capsule8, was, you know, continuous delivery is in some ways, you know, an accidental, great defense mechanism, right? And like, when you talk about things like reverse up time, what you’re really doing is, you’re forcing an attacker to have to re-persist, right?
Shimel: Yep.
Markowsky: And, you know, when they go through that kind of thing, that’s when they’re sort of sticking their neck out the most. And so, again, I am—and this may be a little bit more visionary than I intend to be or whatever, but I do believe that as we sort of get better and better at change management, right, the idea that home field advantage could be a useful thing and could help us in terms of detection engineering, I think, is there, right? I mean, even some of the sort of Chaos Monkey stuff we’re seeing, right?
Shimel: Mm-hmm.
Markowsky: If you just randomly rip out a host underneath an attacker per session, right, they still have to come back in. And again, every time they do this is another chance for them to detect them.
Shimel: Absolutely. And the thing there, also, Pete is the shift left aspect of this, too, which is—we need to make security synonymous with quality.
Markowsky: Mm-hmm.
Shimel: So, because, when you look at sort of your classic DevOps best practices or—I don’t know if there’s such a thing as DevOps best practices—state of the art DevOps, let’s call it, right? Traditional kind of stuff. What we should be doing is taking those lessons learned every time we redeploy and we see how an attacker or someone with ill intentions has to kinda re-jig, right? We need to be setting up the feedback loops about what we learned, right, and pour that back into the iteration, to the next iteration, right, so that it’s not just that we redeployed, but we redeployed, and based upon what we saw from the last deployment, we shut off yet another avenue, another vector. We trimmed that surface up a little bit more, right?
Markowsky: Mm-hmm.
Shimel: Re-harden that. And I think, you know, it’s continuous security, and it’s continuous improvement in security that that results in. And I don’t know if it was unintentional or not, but Peter, I’ll tell you that six, seven years ago, when I first read the manuscript of Phoenix Project and got into the whole DevOps thing, that was exactly the reason why I got into it is, I thought this was the best thing for security that came down the pike in my 20 plus years doing it.
So, you know, yeah, you’re talking to—preaching to the choir, here.
Markowsky: Yeah, it’s funny. I started in this industry doing a lot of what used to be traditional, I guess it’s more DevOps-y now, but it was sysadmin work when I was at Northeastern University. And I had the good fortune of working under some really great folks like David Blank-Edelman who was associated with sort of USENIX and they started doing a lot of this kind of stuff, you know, auto generate configs, do large scale configuration management and that kinda stuff.
And, you know, the thing that I think is truly great about this, right, is you plan for change, right?
Shimel: Mm-hmm.
Markowsky: Issuing patches is not the same, scary proposition when you can say, “Okay, we’re gonna do the patch, we’re gonna roll it out, maybe we do some canary deployment, see how it goes with like 2 percent of production traffic.” Okay, we can say we know what our risk is, right?
And being sort of more empirical about that, right, really lets you go forward and, you know, tying it back to what we do, right, we want to enable that as much as possible, right?
Shimel: Yep, absolutely.
Markowsky: And this is [Cross talk] security, you know?
Shimel: I got bad news.
Markowsky: Yep.
Shimel: Dude, we’re so far over our time limit here, I’m afraid.
Markowsky: It’s all good. I was super nervous in the beginning, which is why I was sort of word salad and paragraphs. [Laughter]
Shimel: That’s okay, man! It’s all good. I think people will enjoy it. You know, our audience tends to be technical, Peter, and so, when you can talk about nuts and bolts and not at 5,000,000 feet, they enjoy that.
Markowsky: Okay.
Shimel: So, first of all, thanks for being our guest on this episode of DevOps Chat. Success to you and the rest of the team at Capsule8, tell them I said hello.
Markowsky: Will do.
Shimel: I may have you back on and we can continue the convo.
Markowsky: Alright. I’d like that very much.
Shimel: Cool?
Markowsky: Thank you.
Shimel: Alright, man! Hey, this is Alan Shimel for DevOps.com. Until next time, have a great day, everyone. We’ll see you soon.