The Data Pipeline and Digital Transformation
Companies are generating, ingesting and consuming massive data streams, which are critical for business success. Because of this, Ameesh Divatia, co-founder and CEO of Baffle, believes that digital transformation will accelerate companies’ reliance on data pipelines, allowing multiple sources to feed a data warehouse using streaming mechanisms. Divatia discussed why he predicts that in 2023, we’ll see companies more aggressively transition from traditional data stores to data pipelines.
SB: Could you explain what a data pipeline is?
Divatia: The data pipeline is the infrastructure used to collect, store and process data in an IT environment. It can be on-premises, in the cloud or a hybrid of the two. The data pipeline is more suitable for situations where a large amount of data is continuously generated and that data needs to be turned into insightful information that helps organizations identify inefficiencies, opportunities for growth and emerging trends—all of which can help create market differentiation.
SB: How will transitioning to a data pipeline offer better security than traditional storage?
Divatia: Existing protection mechanisms were not designed for the volume of data companies are storing, nor are they sufficient for today’s data on the move. Today’s modern organizations are built on data collection so they need to be able to store and analyze it for insights and share it as needed.
However, this new data management paradigm can result in sensitive data that is vulnerable to compromise—intentionally or unintentionally—leading to noncompliance, revenue loss and reputational damage. Security for traditional storage revolved around at-rest protection, with physical media compromise as a threat. With data pipelines, it is critical to build protection into the pipeline with data de-identification methods and privacy-enhanced computation techniques that allow data processing for data in use as it is being streamed without exposing it, thus better securing the data pipeline.
SB: Why do you see this transition from traditional data stores to data pipelines happening in 2023? What’s leading up to this movement?
Divatia: The move to the cloud accelerated since the onset of the pandemic as companies quickly pivoted to support remote work environments, and a steep learning curve ensued. But now, as best practices for securing data throughout the pipeline are established, many companies that took a “wait and see” approach in the past will make the jump this year.
The cloud is offering a limitless amount of storage and compute capability at a scale not known before, prompting enterprises to rearchitect their data analytics pipelines to migrate data from on-premises environments, store it in data lakes and extract valuable data into warehouses where it can be analyzed.
Executives understand the critical nature of data in making business decisions. The proof point is the ever-increasing investments in data processing infrastructure and data engineering personnel. The conduit for data insight is the data pipeline.
However, data pipelines can be compromised—intentionally or unintentionally—leading to noncompliance, revenue loss and reputational damage. Companies will make more significant investments in 2023 to employ data de-identification methods and privacy-enhanced computation techniques that allow data processing without exposing the data.
SB: How does the data pipeline assist in an organization’s digital transformation?
Divatia: Data fuels business decisions. But for data to provide value and enable digital transformation, it must be easily available for analytics. Companies realize the immense promise of the cloud with infrastructure that can scale easily. However, allowing irresponsible data to move into analytics pipelines [as they get built] is a recipe for disaster when a data breach occurs. Hence, good security practices involving field-level data protection controls must be in place to mitigate these risks of unauthorized access and theft posed by the expanded attack surface.
In the analytics pipeline, event information is streamed as unstructured data is ingested “upstream.” As data moves “downstream” toward the end of the pipeline, it is cleansed, organized and analyzed.
As the volume of valuable data continues to grow, businesses can gain previously unattainable insights. But a single breach’s reputational and financial fallout can overshadow the benefits the analytics pipeline offers. By implementing analytics pipeline protection strategies, organizations can use the data they have worked to generate, collect and analyze, and reduce the risks of unintentional or nefarious data exposure.