Document Sanitization: avoid making your business look foolish

<a href='/blog?tag=Adaptive Redaction'>Adaptive Redaction</a> <a href='/blog?tag=Data Loss Prevention'>Data Loss Prevention</a> <a href='/blog?tag=Compliance'>Compliance</a> <a href='/blog?tag=Cyber Security'>Cyber Security</a>
Document Sanitization and Metadata

Human error is one of the biggest problems businesses face when it comes to data leaks and unwanted data acquisition. There can be a number of systems in place to ensure that confidential files cannot be shared with unauthorized individuals, however, what’s often overlooked is the sensitive metadata hidden in everyday documents and files that also have the potential to cause major data breaches.

People often forget that most digital office documentation contains automatically created sensitive information – such as the author name, revision history, application software and version number. This is known as metadata, which can be compromising when shared outside of an organization. It is important for businesses to understand how metadata can affect them and how it could result in a potential data breach.

Take for example, if a company creates a proposal in-house by copying a previous document and then revising it to suit the new opportunity, before submitting the proposal to the prospect. If the document had not been sanitized and the revision history removed before sending, the prospect in this example would have a wealth of information at their fingertips. Anyone with access to that document would be able to see the names of every person that has worked on the proposal or view changes made to elements such as the scope of works and budgets.  

While this is potentially embarrassing, there is a far more sinister threat when metadata falls into the wrong hands. The information contained in document metadata is invaluable to cybercriminals. For a hacker, knowing what software version is in use means they can craft an attack around known vulnerabilities in that software. Knowing the author and their email address means they can craft a phishing email with a weaponized attachment.  But, how might documents fall into the wrong hands? The simplest route is through the Internet, ‘harvesting’ metadata attached to files on the company website.

Why Now?

The assurance that sensitive data does not leave an organization, or rather never reaches unauthorized recipients, is even more important with the impending enforcement of the GDPR. Good information governance offers a competitive advantage, but with GDPR there are other requests which can be made for which organizations need to be prepared. The most onerous of which is often called ‘the right to be forgotten’ or RTBF.

After 25th May 2018, individuals, including customers, can request their right to be forgotten, which involves the discovery and potential removal of any and all personal information from your network and systems. Information spreads rapidly, so finding it and then making the decision as to whether it can be removed in an efficient manner is important. This does not necessarily end at the organization boundary. It can also apply to third parties which you have shared the information with. Or vice versa.

Receiving unwanted (or unauthorized) data can create as many challenges as a data leak. This is especially important when implementing RTBF requests. If you haven’t received the data, then you won’t have to find and delete it. For example, a spreadsheet might have been sent, but unknown to the sender, there were hidden columns of data which contained sensitive information which should have been removed. This unwanted/inadvertent data acquisition makes it even more challenging for organizations to track down and remove the sensitive data to comply with RTBF requests and GDPR in general.

Our recent report ‘The GDPR Divide: Board Views vs Middle Management’ reveals that almost half of board members believed they had duplicated customer data (for example by copying reports to multiple systems), suggesting it is increasingly difficult for organizations to rely on operator initiated processes – such as manual inspections and deletion.

With employees already expressing uncertainty at their capabilities of handling RTBF requests, and just a third of management respondents believing the business can handle multiple requests concurrently, businesses need to implement a solution now. There are two places where technology can help. The first is preventing unwanted data acquisition, by removing unauthorized information through sanitization and redaction at the boundary, before it enters the network. The second is with discovery, implementing a solution which will automatically search a network for unstructured data, such as reports, and then move or remove it on request.

The same technology which prevents unauthorized data acquisition can also be used to prevent data loss, without compromising collaboration – protecting both the organization and its partners.

Our whitepaper also reveals that only 17% of employees would actually delete an email that was sent to them from another company in error. Even fewer would make the sender aware, despite it containing sensitive information meaning that customer data is more likely to have been duplicated via unwanted data acquisition without any awareness. When it comes to GDPR compliance, it is critical to tackle this issue and ensure that customer data is not shared either by mistake or on purpose.

Safety Assured

In order to remove the reliance on manual user processes, there is a need for company-wide systems that can automatically detect when sensitive information is about to be sent or received across an organizations boundary. A system which offers assurance and automatic protection across the entire network without causing a hindrance on the way business is conducted.

Clearswift’s Document Sanitization feature automatically purges common file formats of sensitive data to prevent inadvertent data leaks:

  • Removes outstanding revision changes
  • Clears history and fast-save data that potentially holds embarrassing critical information
  • Completely removes document properties, such as “Author”, “Organization” and “Status”
  • Removes data attached to photos, such as coordinates and other metadata
  • Granularity ensures that specific properties can be preserved, such as classification information

Automation ensures that the policies are consistently applied. Document sanitization provides assurance that users are sharing documentation and files – either inside or outside of the organization – without posing a data breach threat or a compliance failure.

Document Sanitization is a component of Clearswift’s unique Adaptive Redaction technology, available with Clearswift’s SECURE Email and Web Gateway products. It can also be deployed to augment existing (non-Clearswift) email and web products.

Don’t hesitate to contact our team for more information.


This is a Security Bloggers Network syndicated blog post authored by Bianca.du.Plessis. Read the original post at: Clearswift Blog