Cloudera and Hortonworks Merge

Posted under: News

I’ve been planning to do a blog post on the recent announcement of the planned merger between Hortonworks and Cloudera, as there area number of trends I’ve been witnessing with the adoption of Hadoop clusters, and this merger I feel reflects them in a nutshell. But catching up on my reading I ran across Mathew Lodge’s recent article on Venture Beat titled Cloudera and Hortonworks merger means Hadoop’s influence is declining. It’s a really good post. I can confirm we see the same lack of interest in deployment of Hadoop to the cloud, the same use of S3 as a storage medium when Hadoop is used atop Infrasrtucture-as-a-Service (IaaS), and the same developer-driven selection of whatever platform is easiest to use and deploy on. All in all it’s an article I wish I’d written as he did a great job at capturing most of the areas I wanted to cover. And there are some humorous bits like “Ironically, there has been no Cloud Era for Cloudera”. Check it out as it’s worth your time to read.

But there are a couple of other areas I want to cover.

It is rare to see someone install Hadoop into the public IaaS account. Customers (now) choose a cloud native variant and let the vendor handle all of the patching and hide much of the infrastructure pieces from them. And they have the option of spinning down the cluster when not in use,making it much more efficient. Couple that with the work to set up Hadoop it’s an easy decision. And I am somewhat surprised that things like AWS’s Elastic Map Reduce (EMR) is not always the chosen repository, but rather Dynamo, given it’s powerful query capabilities, indexing and ability to offer the best of relational and big data capabilities. Most of the public IaaS vendors offer so many database variants that it is easy to mix multiple variants in to support applications, further cutting into usage.

One area we continue to see adoption for Hadoop is the on-premise data collection and data lakes for logs. The principle driver commonly cited is the need to keep Splunk costs under control. It takes effort to divert some content to Hadoop — as opposed to sending everything to the Splunk collectors — but data is collected and held at drastically lower costs. And you’re not sacrificing on analytics. For organizations collecting every log created, this is a win. We also see Hadoop adopted by Security Operations Centers running side by side with other platforms. Part of the need is to fill gaps with what their SIEM keeps, part is to keep keep costs down, and part is to easily custom applications for security intelligence by people who are not professional software developers.

Another area not covered in any of the articles I read is that both Cloudera and Hortonworks have a deep catalogs of security capabilities. Together they are dominant. As firms do use large scale ‘data lakes’ to hold all sorts of sensitive data inside Hadoop this will be a win for firms running Hadoop in-house. Identity management, encryption, monitoring and a whole bunch of other great stuff. Big data is not the security issue it was 5 years ago. Hortonworks and Cloudera have a lot to do with that, and when you couple the capabilities and experience with enterprise deployments makes them a powerful combination for helping firms manage and maintain existing infrastructure. My way of saving some of the negative press from the financial blogs is not fully warranted IMO given there are profitable avenues ahead.

The idea that growth in the Hadoop segment appears to have been slowing is nothing new. AWS has been the largest seller of Hadoop based data platforms, by revenue and by customer, for several years. Cloud is genuinely an existential threat to all of the commercial vendors of Hadoop — and similar big data — databases if they continue to sell in the same way. The recent acceleration of cloud adoption simply makes it all more apparent that Cloudera and Hortonworks are competing for a shrinking share of IT budgets. But it makes sense to band together and make the most of their expertise in enterprise Hadoop deployments, and possibly helping with tooling and management software for cloud migrations. If Kubernetes is any indication, there is huge areas for improvement in tooling and services beyond what the cloud vendors provide.

– Adrian Lane
(0) Comments
Subscribe to our daily email digest

*** This is a Security Bloggers Network syndicated blog from Securosis Blog authored by [email protected] (Securosis). Read the original post at: