The amount of data used and stored within your organization is growing at an exponential rate; it’s the nature of data. However, along with the growth of that data, comes the added responsibility of identifying and securing sensitive information that is stored within your organization, whether it is in a structured or unstructured format.
Sensitive data in today’s world includes Cardholder data, Personally Identifiable Information, Financial Information, and Health information. We are also seeing the evolution of sensitive information to include location data, genetic and biometric data too.
This is all well and good, but what happens when there is information within your data that appears to be sensitive information, but actually isn’t? How do you differentiate between what is legitimate sensitive information and what is data that may be masquerading as sensitive data?
This is where I will introduce the term ‘False Positive’, which can be defined (by Google dictionary) as follows:
noun
noun: false positive; plural noun: false positives
- a test result that wrongly indicates that a particular condition or attribute is present.
When discovering sensitive data within your organization’s structured and unstructured information repositories, there is always a possibility that data may incorrectly match types of data you are searching for.
After all, strings of numbers and characters appear throughout computer systems and the combinations of these all have the potential to match sensitive data formats. One can also often encounter seemingly legitimate matches that are actually a False Positive in the context for which you are searching.
If we look a little deeper into this issue, we can also see that different numbers and concentrations of False Positives can occur according to the target data repository you are scanning, along with locations within the specific repositories.
For example, you may find differences in False Positives across your workstations, servers or user mailboxes within your environment. Furthermore, you may discover False Positives in a file location within a workstation, such as the applications directory, whereas you may see False Positives appearing in the email signatures within emails in your user mailboxes.
The above examples are just a small sample of where False Positives can be identified when performing a forensic search for sensitive data within your organization. This sample multiplied across many systems and locations housing your data can adversely affect your ability to discern the actual sensitive data which you are trying to identify, from the False Positives within your target data repositories.
So, how do you address this and ensure you don’t spend your valuable time combing through large numbers of False Positives to get to the real sensitive data?
You use a tool that:
- Employs False Positive mitigation techniques from the start of your sensitive data discovery.
- Provides pre-built, complex matching patterns that account for algorithms, checksums and ranges within a wide variety of Data Types.
- Checks against already known False Positives and Test Data patterns across a wide variety of target data repository types, from a workstation, server, network storage, and email, to the database and cloud-based targets.
- Identifies and checks the context of potential matches to determine the certainty of a match being a True Positive instead of a False Positive.
- Continuously updates and improves the existing patterns and data types within the tool to make them more efficient and in-sync with today’s data type formats.
- Continuously adds new data types to allow you to search for increasingly complex and unique data types within your data repositories.
- Allows the customisation of data types and patterns to adapt to data specific to your organization.
Author: Simon Davey