In this Security Boulevard Chat I speak with two of my heroes from the InfoSec/AppSec world: Jeremiah Grossman and Robert “RSnake” Hansen. Jeremiah and Robert just announced the launch of their new company, BitDiscovery, which is emerging from stealth, fresh off a venture capital raise and the acquisition of Robert’s company, OutsideIntel.
BitDiscovery offers a radically different way of doing asset inventory and website discovery. Rather than having to run a scan when you want to discover your assets, you just query the master database BitDiscovery has assembled and it tells you what it has already found there. This is possible because BitDiscovery keeps a snapshot updated constantly of just about the entire internet. That is one big data-based solution right there.
Robert and Jeremiah are just the kind of guys to make something like this work. Also after speaking to them, I believe that with that kind of data, there will be a lot more uses for this technology that will present themselves to the BitDiscovery team. With the track record of these two pioneers, look for big things!
As usual, the streaming audio of our interview is immediately below, followed by the transcript of our conversation.
Alan Shimel: Hey, everyone! It’s Alan Shimel, DevOps.com, Security Boulevard, and you’re here listening into a Security Boulevard Chat. I’ve got a really kind of exclusive for us today. I’m happy to be joined by two people I know from the security world, probably for more years than we want to admit to. [Laughter] I’m happy to be joined by Jeremiah Grossman and Robert “RSnake” Hansen, who are the Co-founders of a new company just coming out of stealth called Bit Discovery. That’s B-I-T, D-I-S-C-O-V-E-R-Y, BitDiscovery.com. Jeremiah, Robert—welcome to Security Boulevard Chat!
Robert Hansen: Thanks for having us. [Laughter]
Jeremiah Grossman: Great to be with you, Alan. Pleasure to be with you.
Shimel: Okay! So, first of all, guys, you know, most of my security listeners out there, I think, are gonna be familiar with both of you, but for those who aren’t, let’s start with you, Robert. Quick background on, up to co-founding Bit Discovery, here. What’s your background?
Hansen: Well, I’ve been in computer security for about 23 years now, so most of that was spent in web application and browser security. But a couple years ago, I decided to kinda move into a different space, and I built this thing called OutsideIntel, which we’ll be talking about a bit, and Jer came along and didn’t think I was as crazy as everyone else did, and [Laughter]—or maybe he did think I was crazy and, you know, sometimes it can be fun to hang out with crazy people.
Grossman: He’s my kind of crazy.
Hansen: [Laughter] And then we started working together on this project, and it’s been a wild ride already just in the last handful of days, here.
Shimel: Got it. Jeremiah, how about yourself?
Grossman: Sure. So, I’m Jeremiah Grossman, and I’m not quite as experienced as Robert. I’ve only been in the industry for about 18 years. I started at Yahoo as the web security guy, and I took what I learned from Yahoo in web security and managing—helping manage and take care of that environment, and I founded WhiteHat Security, which became a very large player in website vulnerability assessment. And then, taking what I learned there, and I got to work with Robert for a couple of years there, but together, we’ve been doing just a massive amount of research in the industry for over a decade now, so it’s been a lot of fun.
And, as it unfolded, we both encountered the same problem, which is, you know, wanting a lot of data to do asset inventory and many other things. The opportunity and the timing just came out right, and we said, “Let’s take on this asset inventory problem. You have a great tech platform.” I can do the apparatus to do the UI and productize what he has and, you know what? All things lined up, and we’re pretty excited.
Shimel: Very, very cool. And, guys, you should—I told you this off mic, but both of you have been kinda heroes of mine, and I’ve been in security just a few years myself, but I’m older than both of you. And, you know, I’ve always admired the work that you’ve both done, so I was really, really happy to see this announcement around Bit Discovery.
But let’s jump into it. Bit Discovery—what’s it about?
Grossman: Sure. I’ll kick that one off. So, you know, when I was doing vulnerability assessment for a number of years, the real core problem that we were solving was that websites had vulnerabilities, we just didn’t know where they were, and given the scale of the web, we’ve got to do it at S scale.
So, when we got into that problem and started to solve it, there came two more problems that showed up—one is, when you do vulnerability assessment well, you find out quickly across the web there’s more vulnerabilities that could be fixed now, or any time soon, and really, no one has a great solution for that.
But there’s another, more basic problem, that everyone seems to have, and that’s asset inventory. No one seems to know what they own, what they do, or who’s responsible for it, and everybody tries it different ways, and everybody’s failed at it since as long as I’ve been in the industry. And that’s the next big problem. I mean, it’s the one problem, the one thing that every security expert agrees on—you gotta know what it is that you’re protecting, otherwise, you really can’t.
Grossman: And so, that’s where we jumped into it. We wanna solve that one problem. We think everybody needs an asset inventory. So, the way we approach it is fundamentally different. All the tools and solutions that are out there are generally designed to either—they’re complicated, but they’re also slow, because they do on demand scanning. You put in the domain name, an IT range, and the process kicks off. Hours, days, weeks later, you might get a pile of not so very good results, and certainly, that’s not an enterprise friendly product.
So, what Robert did through OutsideIntel, which became the technology platform that drives Bit Discovery is, he pulled together a massive amount of Internet port scan data, Whois data, passive DNS data, crawl data—anything and everything we can get our hands on. So, that way, when people do asset inventory with Bit Discovery, it’s a query. That’s why we’re able to get super-fast speed, because it just pulls it out of our database, because we’re constantly scanning the entirety of the Internet. It was a very hard way to build it, a very expensive way to build it, but the results speak for themselves.
Shimel: This kinda reminds me of the scientists working on mapping the universe, and almost as complex.
Grossman: And the internet, like the universe, is still expanding.
Shimel: Exactly. I was gonna say that. Good point. So, guys, in all seriousness, a friend of mine, my friend Raj, Rajat Bhargava, after Interliant, started a company called Quova, which was one of the first sort of IP geolocation type services. And, to do it the way they did it, it was a very similar thing—they literally had to map out every IP address on the Internet, you know, by C class, B class, et cetera, and map that to a geographic location, which allowed you to do geographic based serving ads and stuff like that.
Hansen: You know, it’s a labor of love, you know what I mean? I know it sounds crazy, but I just really, really have always loved having all of this data just at my fingertips. Ever since I first got it even vaguely working—and it took many iterations to get to the point where it is now, but even when it was just barely in its infancy and I had multi-minute lookup times instead of subsecond lookup times, I was still just unbelievably stoked to have all the data and be able to query and have it even being somewhat reliable.
So, it might sound horrible to have to map out every single IP address by hand, and that is horrible, and I don’t envy that job! [Laughter] But, you know, if you love it—like, and this has always been a problem. Like, I had a very similar problem, I started a consulting company in 2005 or something, and the very first problem, day one, hour one, first client, I’m like, “Okay, well, what do you own?” And they’re like, “Well, I thought you were the smart guy. I thought you’re supposed to find that.” [Laughter] And I think it’s just one of those things that, if you love it, you’re just gonna do it, and this has been bugging me forever. And, so I’m glad we were able to do it.
Grossman: And it’s interesting about this particular problem, because the process is tedious, it’s difficult, there’s no book that tells you how to scan and map and index the Internet. And we ended up with an actual, real deal big data problem. And not everybody has it in security or InfoSec or whatever. And we had all compute problems—we have CPU issues, data size issues, memory issues, bandwidth issues.
So, that’s what it takes to do this right. You just have to have the data platform in order to do it, and it’s like Robert said, it’s a labor of love. It took us years to get here.
Shimel: Oh, absolutely, and I can see why. So, a couple of questions popped to mind, guys. Number one—so, I can see doing it one time, right? Mapping this all out one time. In terms of keeping it current, though, I mean, is it so you just create a big dif, a dif file, right?
Shimel: You know? [Laughter] How the heck do you do this?
Hansen: Yeah, I mean, that’s oversimplification, but absolutely, yeah. And one of the things, I made a design decision when I very, very, very first started building it that I was never gonna throw any data away. So, it’s even worse than you’re thinking, because it’s not like you then throw away the last snapshot and just keep the dif or something.
I literally, I have years of historical data, which doesn’t sound, on the surface, like a very smart move. But one of the really interesting features that we haven’t exposed to the interface, but it is something we could theoretically expose is, I can rewind the Internet back to whenever I want. So, if I wanna go look at what Hillary Clinton’s mail servers look like, I can go back and look at it. If I wanna look when Adult Friend Finder or whatever—you know, Ashley Madison, rather, got compromised, I can go look and see their admin server sitting right there, totally exposed. I’m sure that’s probably how they got popped. Now it’s not there any more.
Hansen: So, it gives you some really cool insights into how things happen. And one of the reasons I really like that is, a lot of times, things happen and you weren’t paying attention, and you didn’t know to be analyzing something at the time. But then it happens and you’re like, “Ah, I wish I was looking at that back then,” and you can suddenly rewind the tape.
The other reason that’s really useful is, a lot of people use things like Cloudflare or Incapsula or DOSarrest or whatever, you know, front end CD end servers for caching or security or for obfuscation. But, by virtue of having years of data, I can rewind the tape and just look and see where they were hosted before. And where they were hosted before is probably where they were still hosted, they just pointed their DNS at one of those services. So, I can just rewind the tape, go look at where they were before, and connect directly to them completely by passing that security and caching layer.
So, there’s some really interesting features of having that kind of data.
Grossman: And just to complicate the problem just a little bit more, you know, while the Internet expands and there’s new web services and servers that go up all the time, just from a website context, you imagine that websites change technologies and add them and remove them over time, whether it’s Google Analytics, jQuery, you know, they patched their Drupal service or whatever, we have to keep track of all that stuff as well. We want all those little bits of metadata across every site on the Internet. That’s what we’re collecting, because we need to surface that data, too.
Shimel: You know what, just to give people something to oogle and oggle about—how big, I mean, I haven’t kept up on how many sites and IP addresses are actually in use on the Internet any more in a while. But give me some scale, here, guys—what are we talkin’?
Grossman: So, Robert, you gotta tell him about how much data we have to process monthly. [Laughter]
Hansen: Yeah, it’s a little hard to say, just because I have so much compression and stuff turns on in terms of actual size, but it ends up being, per month, it would probably end up being about 400 gigs a month of just one data set alone. If you start adding in all the stuff I actually want to do and I just haven’t gotten around to—like, for a while, I was processing about 780 terabytes every quarter, and that got too expensive. My data center was starting to send over some very, extraordinarily large bills to me, so I kinda had to stop doing that. [Laughter]
Shimel: Yeah, that’ll do it.
Hansen: And so, I had to scale back, because I was 100 percent bootstrapped at that point. But that is where I want to get to. I mean, the petabyte scale is where this problem lives. It’s not—unfortunately, it’s not one of those things you can run in your basement, as much as I wanted to and tried. [Laughter] It’s a big data problem, it really, really is.
Shimel: You’d need a big basement with a lot of electricity for what’s coming in.
Grossman: [Laughter] ________
Shimel: So, guys, let me ask a question, then—you know, as Bit Discovery, your target customer, are they using it to discover their own assets, or is it more of a forensics tool to see what other assets are out there?
Grossman: You know, the primary purpose is for them to develop an asset inventory of their own things. But the use cases that people are coming to us with are far beyond that. So, we want any company, if they’ve got more than a handful of websites, to be able to quickly develop a comprehensive, detailed view of their asset inventory.
However, if they’re doing an M&A transaction, if they wanna know what’s being bought or sold or they’re working with a partner and they want to do an asset inventory for them, they certainly can. So, it’s investigations in that regard. It became useful in cyber insurance. You know, the carrier wanted to know exactly what the assets are of their client that they’re about to insure. Marketing teams wanted to use it because they wanted to know what marketing programs that they’re running that they have or had out there and sorts of things. The legal teams wanted to know it, the finances teams wanted to know it—like, there’s tons of use cases for it. But we’re gonna focus on just one—it’s for the companies themselves.
Shimel: Got it. So, guys, let me ask, you know, from a competing products point of view, you know, I remember back to BigFix, which got bought by IBM and was, I think it was part of the Tivoli suite, and then on top of that, the people who started BigFix, the founding team, was a father and son who went out and started a company called Tanium, which you probably know there.
Shimel: Is this competing with that, do you think? Or, I mean, I realize they come at it from a different angle altogether, the old traditional skin and, “Let me tell you what’s out there” rather than querying an existing base. But fundamentally, the problem it solves—what do you think?
Grossman: We don’t—yeah, yeah, we don’t think it competes with Tanium or BigFix, and I’ll tell you why.
So, let’s parse the problem down. There’s all the things that a company owns that’s publicly facing—you know, websites, mail servers, DNS servers, and things like that. And then they have all their internal IT and their IT assets that are shielded from the Internet—all their desktops, routers, servers, and things like that.
Bit Discovery is primarily focused on everything that’s public. That can have some overlap with BigFix and Tanium, but not really. So, what Tanium and BigFix do, it’s more at the host layer, you know, OSI Layer 6 and down and not necessarily the web, like, where we’re focused. And that’s the other difference is the layer. So, we’re looking right now at websites—we have data for far more than just websites, and we’re not doing much in the way of patch management and configuration. It’s usually just asset inventory. We’re not keeping track of Windows devices and attachments, you know, things like that.
So, it’s just a different layer and a different focus entirely. You really need both, but it’s gonna be separate.
Hansen: Yeah. How I like to say that that’d be different is, from the Internet, obviously, an internal attacker would be a terrible thing—like, really, really terrible. But you kinda already had to make a bunch of mistakes to have an internal attacker. From the external Internet, it’s just much easier for an attacker to attack your web application, which is why we see so many successful breaches from the Internet through websites.
The other problem is—and when you’re talking about the difference between those two—a lot of times, the company don’t actually own whatever they’re pointing their DNS to. They’re pointing it at a mail server, but that mail server is Google, or they’re pointing their website, but that website is a WP Engine or something, you know what I mean? It’s a third party that’s doing it, so you’re not gonna run any sort of agent or you’re not gonna probe your internal network and find that stuff, you know what I mean?
Hansen: It’s just not—it’s not there. But it’s still very important, because sometimes you’ll find that XYZ has a vulnerability and you’re like, “Oh, my gosh, which one of my sites was running XYZ? Where are we using that technology? Oh, it’s over there. I didn’t even know we were using that software with that product” or whatever. So, it helps kind of solve that part of the problem of just not knowing where anything actually is so that you can go try to solve that problem.
Shimel: Excellent, excellent. So, let me—while we’re on the subject of competitors and business, so, give me a little bit of the model here, guys. How do customers engage Bit Discovery?
Grossman: So, right now, we’re collecting as much—we’re in a beta phase, I guess you’d call it, where we’re giving free invite only accounts to people so they can test the product. We believe we’re about six to eight weeks from a good production release. We want our customers and prospective customers to really guide our roadmap to tell us exactly what we need to build so we can solve those particular use cases.
And, you know, one of the things that, we wanted to run Bit Discovery a little bit different in the way they go to market strategy that’s worth talking about—you know, one of the things that we heard from people out there just in the last, over a decade of experience, you know, a lot of people will hate hearing it, but the last person on Earth an InfoSec person wants to talk to is yet another enterprise security sales rep.
Grossman: So, we want to be as low clutch on the model as possible. We want people to come to the website, learn about the product. If they’re interested, given an opportunity to try it without talking to anybody, and if they like it? Great. Give us a call, or buy online. That’s what we’re really trying to get to.
For right now, you know, we’re being high touch, because we want people to have a great experience and learn all that we can, but that’s where we’re gravitating towards. We wanted people to be able to self-serve and get out of their way and get their job done.
Shimel: I hear ya. I hear ya. In terms of pricing of the product, how will that work, do you know?
Grossman: It remains to be seen. We know it’s gonna be an ongoing service, so there will probably be some amount of subscription pricing, and we’re going to, I guess, unveil that later, I guess, is the right word. We’re trying to produce as much value as we possibly can for the customers over the last six months and in the next six months, and we’ll figure what a fair price and a good pricing model is, but it will be subscription pricing.
We seem to me, for competing solutions, we seem to be competing in the areas of tens of thousands if not hundreds of thousands for other ways to get this job done, which no one really does.
Shimel: Yeah, no—no one’s approaching it this way, either, so it’ll be interesting. So, guys, let me ask another question, and that is—Robert, you alluded to it. You’ve been working on OutsideIntel, which was kinda the heart of the technology here, I guess, for a few years now.
Shimel: And then, as part of the Bit Discovery launch, Bit Discovery acquired OutsideIntel, I imagine both the people—you—and the IP that you had built out over the last couple years. And that’s the heart of the Bit Discovery technology, and the company now will be looking for ways to commercialize that.
Hansen: That sounds about right to me. [Laughter]
Hansen: I think it’s pretty fair, yeah. I think OutsideIntel is the proving ground for the technology. You know, we got it up and working stable, looks awful. [Laughter] My UI skills are not there. But it worked, and the major thing that it has proven is that the data sets, the way they’re organized, the speed, how they’re correlated together—that fundamental premise all works, and now we can take that out and liberate it outside of OutsideIntel, no pun intended, and then turn it into something that actually makes sense to an average user.
Because OutsideIntel was really meant for an analyst, somebody who is pretty technical—it was meant for me, and you know, that comes with certain UI sacrifices. Things aren’t particularly explained, there’s nothing, really, to help the user figure out what to do. And so much so that one of the core features that Bit Discovery—the most important, really, feature of Bit Discovery of all is the one where it can remember the fact that you’re interested in this domain. That was never something that OutsideIntel wanted to build or had any intention of building, so that was a really nice additional feature that made it extremely powerful, because now you don’t have to remember your assets any more. They’re sitting there and slowly getting more and more information on the domains of interest—the sites of interest or whatever.
Grossman: And that bit is pretty informative, what Robert mentioned there. There’s OutsideIntel and some of the other tools that are available out there are great for surfacing Internet data—you know, IPs and host names that you might be interested in. But no one brought that to the concept of asset inventory for all the time tracking. No one really had that. So, you really need both. You need to be able to discover websites, track them, manage them, and get an on point data feed for all of them, and we brought it all together.
Hansen: Yeah, I think the other really fundamental thing that Bit Discovery is bringing to the table that OutsideIntel did not is really good external API support. And so, this is something, I think, we’re gonna be building out and really fleshing out, but you know, a lot of people will come—in just the last couple days, I don’t know, I probably had 100 phone calls [Laughter] and various different e-mails and all kinds of stuff.
But there’s a lot of partnerships that are already bubbling up and people want to consume the data. But a really good, bidirectional API that allows them to insert things into it and then pull things out of it—I mean, that is absolutely amazingly useful, because suddenly, it’s like, okay, any time I find something new I want to trigger an audit, or I want to trigger some tool to go analyze it for me or whatever.
And so, we don’t necessarily have to be in the security business ever and still give the users a huge security benefit by virtue of the partnerships that we enable. I think that there’s some really interesting, powerful things that can be done in that space without us having to be doing the heavy lifting. Because there’s somebody out there who’s absolutely brilliant at—whatever, SEO or something, who can build an SEO analysis tool or whatever. Do we want to be in the SEO business? Absolutely not, you know?
One of the other things—some of, I would call them quote-unquote competitors, although not really—have some of this data, or maybe even all of this data, maybe they’ve found ways to get it themselves. But the major difference between us and them is, we don’t make any—we don’t say, “This is your data.” What we say is, “This is probably something you’re interested in. Would you like to add it to the things that we’re tracking and monitoring and put it in a portfolio? Or not—if you don’t want to, no big deal,” and then it’s gone.
And, by virtue of allowing the individual to decide what’s theirs and what’s not theirs, that’s hugely powerful, because instead of it just being us guessing and hoping we’re getting it right, it’s now absolutely 100 percent confirmed by the user. So, it’s a much more accurate list. If we missed something, they’re like, “Hey, you missed this domain that was hidden way over here.” It’s like—okay, great, you told us about something that you cared about. No other tool is gonna do that, because they don’t know about it either, you know? [Laughter]
So, having that user input is absolutely critical to getting sort of the book of truth in terms of what you actually own.
Shimel: Got it. So, guys, we’re coming up on the end here of our allotted time—we’re actually, probably, over. But I gotta ask one other question, and that is this—you know, I think we understand Bit Discovery and it’s potential applications, though frankly, with that kind of data, you may find, my entrepreneur suspicion is, you’re gonna find other uses for this that are equally or more valuable as you continue to explore, here.
But, you know, you’re both giants of the AppSec world, right? In many ways, you were there when AppSec became a thing. What made you want to get out of AppSec, or do you see this as an extension of it?
Grossman: Yeah, we were there, I guess, before AppSec was cool.
Shimel: Right. [Cross talk]
Hansen: You know, technically, we were there before AppSec was even a term. [Laughter]
Shimel: Was AppSec—exactly.
Grossman: It’s really to solve a big problem. I mean, you know, Robert and I came out of AppSec, it’s just where our career paths and our interests took us. But for me, it’s just solving an important problem. I think everybody should have an asset inventory. It would solve a lot of problems if everybody did, and it just—it happened to relate to application security, you know, the finding of websites is one slice, but it would solve a lot of other problems.
So, for me, it is very much AppSec, but also, I get to solve another really big problem, and that’s what piques my interest. It’s something that no one went after, and the world really needs it. And that’s good for me.
Hansen: You know, I guess my version of that answer is almost exactly the same, but I really do think it’s an extension of AppSec. Because, if you read—like, I’m sure a lot of people are trying to be compliant with various different standards, but a lot of those standards say you have to have an up to date asset inventory and I firmly agree that that is critical to having a good AppSec hygiene, because you absolutely have to know what you have before you can go test for those known vulnerabilities that are already out there, right?
And so, to me, this is really a prototypical—the very root of AppSec, if you will, like the very, most fundamental thing. Know what you have, and then you can go test for it. So, for me, it really is still AppSec. It’s just the beginning and the root of the whole discussion.
Shimel: Excellent. Well, Jeremiah, Robert, you know, I promised you we were gonna try to hold it under 20 minutes and we blew past that a while ago. So, we’re gonna have to call it a wrap on this Security Boulevard Chat. First of all—hey, guys, congratulations. Really, really happy for you. You know, best, best wishes and luck and success with Bit Discovery. We’ll be watching. Please keep us posted, though. Come back, let us know what’s happening.
Grossman: And if you invite us, we’ll be back, and we—as always, we sincerely appreciate all the support.
Shimel: Not a problem.
Hansen: Yeah—thanks, Alan.
Shimel: You have my support and you have our community’s support. You guys deserve it. You’ve put your time in. Jeremiah Grossman, Robert Hansen—Co-founders of Bit Discovery, here on Security Boulevard Chats. This is Alan Shimel for Security Boulevard Chats. Have a great day, everyone.