What are HAR files and why detect them?
HAR stands for ‘HTTP Archive’ format and it is a file used to store the interactions between a particular Website and the Browser. These files are used to troubleshoot issues that a particular user might be facing with a website and the issues could be related to performance, page elements, cookies, HTTP headers, page rendering, etc. Large organization’s support and technical teams use the HAR files in order to understand what kind of issues the user should have faced when they inspect the HAR file related to the transaction.
HAR files can be used for debugging as they present a clear picture of the transaction between the Browser and the Web server. This also presents a gaping hole as HAR files also record sensitive transactional information like cookies, authentication tokens, etc. It has become crucial to make sure the HAR files are not mishandled. There are sanitization tools available to make sure that all sensitive information for a particular HAR file is sanitized, and teams can make sure they sanitize their HAR files before sharing them for any support/debugging purposes.
Using Netskope to detect sanitized HAR Files
Netskope’s Real-Time protection can be used to detect HAR files which are being shared. Netskope’s Real-Time protection goes beyond merely detecting HAR files; it also enables the selective sharing of sanitized HAR files while blocking the sharing of those that haven't been sanitized. This is made possible with the help of Netskope DLP and weighted dictionary entities. A dictionary is a DLP detection capability where different words which are present as part of it are inspected for DLP violations. A weighted dictionary is a DLP entity which is similar to a dictionary, but weights are assigned to each word which is part of the dictionary and based on the assigned weights, DLP actions can be taken. Certain words could have positive weights which may lead to a violation but certain words might have negative weights which might reduce the overall score thereby may result in a lesser severity or no violation.
Detecting HAR Files
The procedure shows how to set up a simple Netskope DLP policy which can detect HAR files being shared with the help of Netskope Real-time protection or API- Data Protection.
- Create a custom CSV file which contains a list of words. These words must be present in any HAR file produced. The accuracy of the DLP detection is dependent on the construction of the CSV file. Attaching a screenshot below for reference.
- Navigate to the Netskope Tenant Admin portal and go to Policies → Profiles → Edit Rules → Data Loss Prevention → Entities
- Click on ‘New Entity’ and name it accordingly.
- Select the ‘Dictionary’ radio button and leave the dictionary type to ‘Keywords dictionary’
- Click on ‘Select File’ below and upload the CSV file created on step 1.
- After the upload is complete, save the entity.
- Navigate to Policies → Profiles → Edit Rules → Data Loss Prevention → New Rule
- Click on Custom Entity → Dictionary and select the entity created on step 6
- Navigate without making any changes to the ‘Severity Threshold’ tab and set the severity levels accordingly.
- Make sure to check the box ‘Count only unique record’ and modify at which severity will the policy action be taken.
- Save the DLP rule, attached screenshot above for reference.
- Navigate to Policies → Profiles → New Profile.
- Navigate to the ‘Rule | Classification’ tab and select the custom rule created on step 11.
- Name the profile and save it for use within a DLP policy.
- Now let’s see how we could create a Real-time policy for Google Drive. Please note that similar steps could be followed for any cloud application and also for API-Data Protection but the detections would be reactive in API-Data Protection not proactive as in Real-time protection.
- Navigate to Policies → Real-time Protection → New Policy → DLP.
- Select the user set for this policy and select the application to be ‘Google Drive’ and select the profile created on step 14.
- Select the violation actions to be taken and save the policy. Attaching screenshot below for reference.
Differentiating between sanitized and unsanitized HAR Files
In the above steps, we were able to detect HAR files using Netskope DLP. In this section, let’s understand how to detect and block only the HAR files which have not been sanitized. A sanitized HAR file is one where all the sensitive information is stripped from it. Different sanitizers have different methods to remove sensitive information from HAR files. Netskope DLP could detect if the HAR file is sanitized based on the value replaced from the original file. For example, if a sanitizer replaces all cookie data with ‘CookieRedacted’, the Netskope DLP engine will be able to understand that the file has been sanitized. The word ‘CookieRedacted’ is a replacement word for the actual cookie. If the sanitizer replaces all cookie data with a blank, then the Netskope engine would fail to detect the sanitization as this works by keeping track of the replacement words used.
The above logic is obtained with the help of a weighted dictionary entity in Netskope DLP. Basically it is a dictionary but each word would have weights associated with them and if the words are found in a particular file, a DLP violation is raised. The interesting point is that the weights could be positive or negative, so words with positive weights would increase the DLP severity while words with negative weights would decrease the DLP severity. Consider the example of weighted dictionary below:
Word | Weight |
Test | 10 |
Example | 8 |
Good | -6 |
Bad | -5 |
If the policy is set to trigger for a score greater than 9, the below would be the outcomes for different combinations:
Test Example → Triggers violation as overall score is 18
Test Good → Does not trigger violation as overall score is 4
Good Bad → Does not trigger violation as overall score is -11
Test → Does trigger violation as overall score is 10
Now applying the same logic to terms within a HAR file, one could assign negative weights for replacement words (In this context, a replacement word is the word which replaces an actual sensitive string in a HAR file) within a dictionary. So the overall score would increase as the DLP engine detects HAR file elements in a file but would also decrease if the DLP engine detects words which denote that the file has been sanitized.
Below is a screenshot which shows a sample setup of the dictionary with weights for detecting an unsanitized HAR file. Please note that the replacement words could be different based on your use case and this does not apply to you if the sanitized HAR file contains no replacement words.
Please make sure that the DLP rule created from the DLP entity has the ‘Aggregate Score’ radio button selected in the ‘Severity Threshold’ tab.
In conclusion, we’ve discussed how to use the Netskope Real-time protection to detect HAR file movement and also to allow movement of HAR files if they are properly sanitized. We could use the Netskope weighted dictionaries to detect and protect sensitive data within the organization. As organizations continue to mature their Data Loss Prevention programs, such detection methods would become an indispensable tool in their security arsenal.