GUEST ESSAY: How stricter data privacy laws have redefined the ‘filing’ of our personal data

Filing systems, historically speaking, have been all about helping its users find information quickly.

 Related: GDPR and the new privacy paradigm

Europe’s General Data Protection Regulations (GDPR) changed the game. Generally, filing systems sort by date, department, topic, etc. Legacy filing systems were not built to keep track of the personal data of specific individuals primarily to be in compliance with the many data protection regulations popping up around the world.

Since it took effect in 2018, GDPR’s core guidelines have been copied by LGDP in Brazil, POPIA in South Africa, and the PDPB in India. Under the GDPR, a filing system is defined as “any structured set of personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis” (GDPR Article 4.6).

We can see, by this definition, that the focus of how filing systems should be organized shifts significantly with a central purpose now being the ability to classify individuals and the personal data an organization collects on them. This is an important capability for organizations who need to satisfy this new type data handling regulations.

GDPR requires them to answer access to information requests and abide by requests to be forgotten, among other stringent rules around the use and storage of personal data. And none of that can be done without knowing where someone’s personal information is filed away.

EU – California comparison

The California Consumer Privacy Act (CCPA), soon to be superseded by the California Privacy Rights Act (CPRA) on January 1, 2023, applies to both electronic and paper records. CPRA differs from GDPR, which only applies to records being processed by automatic means. This is a significant difference, given the increased difficulty of manually classifying paper records.

CPRA specifically states, in Section 10, that “[a] consumer shall have the right, at any time, to direct a business that collects sensitive personal information about the consumer to limit its use of the consumer’s sensitive personal information to that use which is necessary to perform the services or provide the goods reasonably expected by an average consumer who requests such goods or services […]”.

Accurate classification of personal information associated with an individual is therefore also a key requirement for compliance — and so is paper document digitization.

Dealing with legacy systems

For companies with legacy systems, the solution for complying with data protection regulations has been to either migrate their old data to a more modern filing system or to maintain separate systems for new data vs. old data. This is because the newer data protection regulations do not apply to old data.


Data migration can be unmanageable for some organizations, to the point where no single user might have read access to all of the files in their system. Getting credentials from each user by itself is a huge hurdle, especially when combined with no real understanding of which users have access to which data.

Companies are also often dealing with numerous kinds of storage, from relational databases to schema-less databases, to giant data dumps of unstructured data in, say, Amazon S3 data storage buckets. It’s no wonder that the global GDPR services market is expected to reach $4.4 billion by 2027.

A myriad of companies have emerged over the past few years to facilitate data cataloguing according to the expectations of regulators. They help with reconciliation of duplicate records, metadata creation about whether there is personal data within files, and even with associating files containing personal data with the individuals to whom they belong.

Data governance platforms often have integrations with Data Loss Prevention (DLP) solutions, which protect organizations from data exfiltration and other data breaches.

Less can be more

While the data governance platforms which are on the market are a great step in the right direction, the process of properly protecting data is far from over. For one, it’s estimated that 80 percent of the data gathered by organizations is unstructured; this includes text, video, audio and image files.

However, the methods used by DLP and data governance companies to detect personal data within unstructured data tend to be limited to using regular expressions, such as pattern matching, which can be unreliable, and sometimes basic AI. The reason tends to be due to speed and compute costs, considering the over “2.5 quintillion bytes of data are produced each day” and the complexity of creating accurate AI models that are also computationally efficient.

Another important missing piece of the puzzle is providing organizations with the ability to minimize data at source. The easiest way to manage personal data is by not collecting it in the first place. And, once again, AI is the best way to accurately minimize the collection of personal data within unstructured data.

The next breakthroughs in data governance are being built with machine learning. It is the only truly reliable hope for compliance with our modern data protection regulations. These are fascinating times during which we get to observe just how quickly the Law can propel technical innovation forward, making the industry catch up to the needs and rights of individuals.

About the essayist: Patricia Thaine is the co-founder and CEO of Private AI, which supplies AI solutions that make it safe to share and analyze datasets without compromising user privacy.

*** This is a Security Bloggers Network syndicated blog from The Last Watchdog authored by bacohido. Read the original post at: