Skip to main content

Objective

Highlight the outcome that improvements to international predefined entities are part of Netskope’s long-term roadmap

Netskope existing detection mostly relies on a third-party entity.

 

Prerequisite

This issue generally applies to all Netskope Standard and Professional DLP modules, including, but not limited to, NGSWG DLP, Email DLP, Endpoint DLP, and more.

 

Context

Customers can test pre-defined identifiers in their own languages, particularly in non-English regions, to verify whether they function correctly. They may only trigger the sample data demonstrated in the UI, so using a variety of real-life samples is highly recommended for more accurate verification.

 

Do You Know?

Non-numerical identifiers such as addresses, names, or terms are likely to bypass Netskope’s DLP detection mechanism without triggering alerts or appearing in DLP incidents. This means the administrator will be unable to track or investigate the data leakage incident.

 

Notes

We’ve discovered that others have already raised similar concerns. We have reported this to Netskope Support and the Account team, but their actions are limited. Please be aware that neither data identifiers nor regex can be trusted 100% for an unknown period of time.

I would argue that you can never rely on pattern matching 100% with any vendor.

 

Pattern matching is the “quick and dirty” way to do DLP in my opinion. It’s prone to false positives and false negatives more than exact data matching.


Exact data matching is the best way to do DLP when possible. Though it is an extra license with Netskope.


I would argue that you can never rely on pattern matching 100% with any vendor.

 

Pattern matching is the “quick and dirty” way to do DLP in my opinion. It’s prone to false positives and false negatives more than exact data matching.


Exact data matching is the best way to do DLP when possible. Though it is an extra license with Netskope.

I completely agree with your point.
Exact data matching is ideal; however, for patterns such as names, addresses, email addresses, national ID numbers, and passport numbers, how can we perform exact matching for these types of sensitive data?

In this case in particular, we observed nearly 100% false negatives with non-English predefined identifiers.

That is why we raised this concern with this article and encourage admins not to fully rely on predefined identifiers, but instead use real-world samples to verify before moving to production.


I would argue that you can never rely on pattern matching 100% with any vendor.

 

Pattern matching is the “quick and dirty” way to do DLP in my opinion. It’s prone to false positives and false negatives more than exact data matching.


Exact data matching is the best way to do DLP when possible. Though it is an extra license with Netskope.

I completely agree with your point.
Exact data matching is ideal; however, for patterns such as names, addresses, email addresses, national ID numbers, and passport numbers, how can we perform exact matching for these types of sensitive data?

In this case in particular, we observed nearly 100% false negatives with non-English predefined identifiers.

That is why we raised this concern with this article and encourage admins not to fully rely on predefined identifiers, but instead use real-world samples to verify before moving to production.

 

It depends on the business really. Most organizations don’t need to match any name, address, or email. They need to protect their customer or employee data. That data is usually stored in a database somewhere. 

Once you switch from the generic patterns to matching on only your customer or employee data the false positives are greatly reduced. Obviously, this easier said than done, as many organizations don’t have the scope of data they are trying to protect defined at the time of purchasing a DLP solution, nor do they always know where the data resides or how to make it available to ingest for Exact Data Matching.

 

If for some reason you need to rely on pattern matching, your best bet is creating your own patterns. Netskope provides the predefined classifiers which have worked well for me in the past with heavy customization, though I believe they are more intended for “quick-start” and POC use-cases.


IDEA-5354 has been created for this issue, and we were informed that it requires a long-term fix.
At the moment, Netskope relies on a third-party solution to develop their predefined identifiers.

It’s important to note that not fully trusting pattern detection and Netskope predefined identifiers not being reliable are two different things, and this should not prevent efforts toward improvement.

We hope this can be resolved soon, as the abundance of predefined identifiers is one of the highlighted features of Netskope.


Reply