Skip to main content

Netskope Global Technical Success (GTS)

Use Case - Netskope DLP - Using File Classifiers with Machine Learning (Blocking Similar Files)

AD_4nXdovlAZFXVFMfWgGG6gitrHVaG-86wAqfYrIZFlDht6IJnyR6Eydrl27R1mDfaZoaiyLW22iffpms9W6dsf1WeBrCCtnD5oXGT4tm_YX7t7kmZFfGGUl58uRXPw0cQxZldan7Ff?key=M9nf-ETFPAzbjHK6T6Obnw 

 

Netskope Cloud Version - 129

Objective

This document outlines the steps to Create a File Classifier trained with Machine Learning that can be used in a DLP Policy to block the transfer of similar files.

 

Context

Using File Classifiers trained with Machine Learning in Netskope is ideal when an organization wants to automatically identify and control the movement of sensitive or risky content—especially when traditional pattern-matching (e.g. regex or DLP rules) is insufficient or too rigid.

Example Escenario: 

A company which manages a vast network of libraries around the world is concerned about potential data leakage involving sensitive documents that detail the books they import into various countries.

To prevent this type of information from being exposed—intentionally or unintentionally—by employees, the company wants to block the transfer of importation-related documents to destinations outside the organization.

However, the challenge lies in accurately identifying Importation Manifests, which may not follow a fixed format or contain consistent keywords. These documents often include key shipment and inventory details critical to the company operations, and traditional DLP techniques may not reliably detect them.

By leveraging Machine Learning-based File Classifiers in Netskope, the company can train a model to automatically recognize the structure and context of Importation Manifests. This enables precise detection and enforcement—ensuring that such documents cannot be uploaded to unmanaged cloud apps, sent via personal email, or shared through unauthorized channels.

 

Configuration:

To archive that, the following steps need to be followed:

  • Identify the files you would like to use for training the Identifier.
  • Create the Custom Classifier
  • Upload the Files used for train the Classifier
  • Create a DLP Profile
  • Create the DLP RTP Policy

 

Identify the files you would like to use for training the Identifier.

In this scenario, the company aims to identify and classify Importation Manifests as part of its data protection strategy. For demonstration purposes, several sample documents have been created to train and test the classifier. Below is an example the sample data:

AD_4nXdulZgiiZ_43BNsqMRfGzXACnZQem0alBgkJE1kY2WKGO7RnwSwTgL4ONY17dh-TgMv3aAilZSDfcmgo3se8tDaAGnm1i6hJz44FbGG0kVLdfPWyx7zrTD1GejTbUZgVGNK0nQTEA?key=M9nf-ETFPAzbjHK6T6Obnw

Create the Custom Classifier

To create a File Classifier please go to Policies > Profiles > DLP > File Classifiers > Custom > New File Classifier 

Name the File Classifier and leave the Match Threshold in 60% as that’s the recommended value.

Upload the Files used for train the Classifier

Proceed by uploading the training files. (Click on Upload Training Files) Only images can be uploaded at this time and at least 20 files should be uploaded to train the classifier.

There are two types of training files used when building a Machine Learning File Classifier: Positive and Negative.

  • Positive Training Data consists of files that represent the type of content you want the classifier to detect—these are examples of what should be identified as a match.
     
  • Negative Training Data includes files that help the model understand what should not be classified as a match, reducing false positives.
     

In this scenario, only Positive Training Data was uploaded, focusing the model specifically on recognizing files that resemble Importation Manifests.AD_4nXf0T7QNLJcLVacJ0Zeww1w6sI36RjvjPNMRbse7I5P77-MgtjtDfGSSCd7CboAYXPbv5Jc29W4szmTTtQg72gGJrE9HyziOepsprvY5gEC6x2oUTwTXEn0H8_Sg7G0hmcs8EcgH?key=M9nf-ETFPAzbjHK6T6Obnw

After uploading the training files, click Save to store both the files and the classifier configuration. Then, wait for the classifier's status to change to “Active.”

If the status does not become Active, it likely means that insufficient training data was provided or some of the uploaded files were invalid.

AD_4nXeDOmXba-hNgrAROYRcxneU9I7jItzVOn6iE2T1Xu_e0aF9161EO5pGLObz9mq5HbmnruaiE9k2VUKRntXvRp4RW3lvJHDhh25fPZIcGbPoREG3v4WfI1jmvlj-A8SBLmt21D7j8A?key=M9nf-ETFPAzbjHK6T6Obnw

Create a DLP Profile

To create a DLP Profile please go to Policies > Profiles > DLP > New Profile > Next > File Classifier > Select the File Classifier previously created > Next and set a Profile Name > Save

AD_4nXd0Pa-KAE-ujeb5MGQOF9r_xSFIX8Pe-_ld4ohFrfYT-RdJECtg6nmfHMtXYJeCKyWSAWeYa36R5iXlT5MnCzgGziUEnvZzKLC0x6ZRfoF_thXt5JJ0hMq_4mteOysbsAg26Hq5tA?key=M9nf-ETFPAzbjHK6T6Obnw

 

Create the DLP RTP Policy

A policy to block Importation Manifests uploads and downloads from Google Gmail would be created.

 

To create a Real-Time Protection (RTP) policy, follow these steps:

  1. Navigate to Policies > Real-Time Protection, then click New Policy and select Cloud App Access.
  2. Choose the user or user group the policy will apply to.
  3. Set the Application to Google Gmail.
  4. Under Activities, select both Upload and Download.
  5. Click on Add Profile to attach the previously created DLP Profile, and set the associated activity action to Block.
  6. Provide a name for the policy and assign it to the appropriate Policy Group.
  7. Click Save to apply the changes.

 

AD_4nXew6xvwRw11eTz_XdLvaDMfXWOo6vHXNAccJjruwgLps1Nm9GTgo6QJB60wFQuuBO6ezIyONyXq_j_hjnGpz4ttiCjbaYEkeQ7xMRCjewgbZG_3vw9n56GWGHmZtv_xZQ47VerEWw?key=M9nf-ETFPAzbjHK6T6Obnw

Demonstration:

File Used for Testing:

AD_4nXcFcs7aErLoLxIKun9YQCO2R71ev3ggjdYv34KiAZrEqANbYdYghpBG2natu_3iDsxPbbeBpuDIGEJmilHEO0YNooVPtHsxdomPH0t5x6XQSpBWf0z1bbTLZXA2k_fAXDIF3H81dg?key=M9nf-ETFPAzbjHK6T6Obnw

File Sample used for Training:

AD_4nXekf6m_8y7g3I4_yK1DJWJ0PL2gdkC0dctXa7iVyOeqKW3Vs9N6e3_EPieIzDi9wayMd_rqOEwHQCNZBEYXEIBlEydtckUiGe6ZMp8_qyP_MsNK4L3Rt2GTYmVQEWHqB5FsNqVm?key=M9nf-ETFPAzbjHK6T6Obnw

 

User Experience:

AD_4nXfleK-hhjg31DPup0XFo9ISogJZ51I2xhDfK2GjbPGjCSqfghah6GYfVdXMO61ptNUofMgpEUQCjwdAJktnnz0VfvfbK6gqLu-U8Mk0xj32l6pu0Lrf1y5zUPWp92eo0MOoLZKJmA?key=M9nf-ETFPAzbjHK6T6Obnw

 

Netskope DLP Incidents:

AD_4nXdOKxS-oYw0rpaoR3jsGPlkwpvmAvwM7MSSvth3aPHLGLEf36WXIRik-0oubZav9csaZpusW1FNpF5VElDiaiQiTPZc4OkYDlrg2bZ8pZkX9mr5hQ8K6vgmVkJDqmCrjnSXkh9gaQ?key=M9nf-ETFPAzbjHK6T6Obnw

Netskope App Event:

AD_4nXfC8u0RCxdgunfOqYE8bUsL5GumKk540meOJ0szkAKPZbx4jf73h8S4x2bv9QRHICqaqtWZrLkL47NulNu8f9hr9J5qTcG7EOMHh0yd4hYg3_EgscGv-Cfb0-wrm_LoHwHjJA8OoQ?key=M9nf-ETFPAzbjHK6T6Obnw

Conclusion

ML-trained file classifiers are really useful when:

  • The content is complex, unstructured, or lacks clear identifiers
  • You need higher accuracy and context-aware detection
  • You want to go beyond what traditional DLP can detect
  • Reducing false positives is a priority
  • Visibility or control is required over intellectual property or internal documents.

Terms and Conditions

  • All documented information undergoes testing and verification to ensure accuracy.
  • In the future, it is possible that the application's functionality may be altered by the vendor. If any such changes are brought to our attention, we will promptly update the documentation to reflect them.

Notes

  • This feature requires an Advanced DLP License.
  • All the sample data used for the testing it’s not real and it was randomly created for documentation purposes.
  • This article is authored by Netskope Global Technical Success (GTS).
  • For any further inquiries related to this article, please contact Netskope GTS by submitting a support case with 'Case Type – How To Questions'.
Be the first to reply!