More on Security Data Lakes – And FAIL!
Naturally, all of you have read my famous “Why Your Security Data Lake Project Will FAIL!” [note: Anton’s ego wrote this line :-)]
Today I read a great Gartner note on data lake failures in general (“How to Avoid Data Lake Failures” [Gartner access required]). Thus, I wanted to share a few bits that, in my experience, are VERY relevant to security data lake efforts I’ve seen in recent years. So:
- “Proponents of data lakes often exaggerate their benefits by promoting them as enterprisewide solutions to all data and analytics problems.” – indeed, we’ve seen the exact same thing with security data lakes! Of course, then the reality hits: you build a huge pile of dirty data poo – and nothing else …
- “Data lakes are rarely started with a definite goal in mind, but rather with nebulous aspirations […]” – same is often seen with security data lakes.
- “Avoid confusing a data lake implementation with a data and analytics strategy. A data lake is just infrastructure […]” – this is pretty much what I said in the post.
- “The popular view is that a data lake will be the one destination for all the data in their enterprise and the optimal platform for all their analytics.” – the paper later explains that, generally speaking, this is very false, becauses it rests on 3 false assumptions. This is false even if scoped down to all security relevant data.
- The paper later describes several exciting FAIL scenarios, all of which I’ve seen with security data lakes. For example, “single version of the truth” as a failure scenario often means a single version of raw unusable data that nobody wants and nobody knows how to use.
- Another “failway” is “Data Lake Is My Data and Analytics Strategy” with its juicy “ego-driven perspective on data lakes: they see them as means by which to be viewed as thought leaders […]” that result in all the useless data, none of the insight situation.
- Yet another FAIL comes from “Infinite Data Lake” confusion. Imagine lots of useless data … now imagine a lot of useless data a year later. Two years. Five years. What is worse than unusable data? OLD unusable data that has even less context. NOW: useless. TWO YEARS LATER: that much more useless at huge hardware cost!
- Finally, they close with: “The goal of gathering all data in one location was never truly achieved in the data warehousing world. It’s unlikely to be achieved in the data lake world, either […]”
Note that this post intentionally does not quote any of the recommendation from the paper. Sorry, but you have to read the paper for that (because policy).
Enjoy!
Related posts:
- Why Your Security Data Lake Project Will FAIL!
- Sad Hilarity of Predictive Analytics in Security?
- On Unknown Operational Effectiveness of Security Analytics Tooling
- Now That We Have All That Data What Do We Do, Revisited
- Killed by AI Much? A Rise of Non-deterministic Security!
- Security Analytics Lessons Learned — and Ignored!
- Why No Security Analytics Market? <- important read for VCs and investors! Works in 2018 too, mostly.
- More On Big Data Security Analytics Readiness
- 9 Reasons Why Building A Big Data Security Analytics Tool Is Like Building a Flying Car
- Big Analytics” for Security: A Harbinger or An Outlier?
*** This is a Security Bloggers Network syndicated blog from Anton Chuvakin authored by Anton Chuvakin. Read the original post at: https://blogs.gartner.com/anton-chuvakin/2018/08/29/more-on-security-data-lakes-and-fail/