Netskope's Methodology and Approach for Security Testing AI/LLM Applications

Forum|Forum|2 months ago
April 13, 2026
0 replies
252 views

ganand
Netskope Employee

In this guide, the Netskope Red Team outlines its strategic framework for identifying and mitigating vulnerabilities within AI/LLM ecosystems. Given the unique challenges and threat landscape associated with AI and LLM systems, a specialized methodology is essential to ensure the robustness, integrity, and confidentiality of our AI-driven features and services.

Our methodology is designed to be comprehensive, covering the entire lifecycle of the AI/LLM application — from data ingestion and model training to deployment and continuous monitoring.

1. Phased Approach to AI Security Testing

Security testing for AI applications is structured into four distinct phases, ensuring thorough coverage of all potential vulnerabilities.

Phase	Description	Key Activities
Phase 1: Design and Data Security Review	Focuses on the foundational security elements before model development.	Data Sanitization review, Data governance policy review, Threat Modeling for Product and its component review, Privacy-by-design assessment.
Phase 2: Model and Training Environment Assessment	Concentrates on the security of the AI/LLM model itself and its training infrastructure.	Testing for Model Poisoning, Adversarial Example generation, Model Inversion attacks, Secure configuration review of the training environment.
Phase 3: Deployment and Inference Security Testing	Assesses the security of the deployed model and the inference endpoint.	API Security Testing, Denial of Service (DoS) testing against the inference endpoint, Model Stealing and Extraction testing.
Phase 4: Continuous Monitoring and Auditing	Ensures ongoing security posture and compliance post-deployment.	Automated logging and anomaly detection setup, Security audit schedule for AI features, Regular review of access controls.

2. Security Testing Methodology

The core of our approach involves a combination of traditional application security testing (AST) techniques adapted for AI, and specialized AI/LLM security testing.

2.1 Adapted Traditional Security Testing

This focuses on the supporting infrastructure and the Web/API interfaces of the AI application.

Web/API Security Testing: Standard tests are performed on the UI/APIs that interact with the AI model, including authentication, authorization, injection flaws, business logic flaws, and data exposure.
Infrastructure and Configuration Review: Review of the cloud/container environment hosting the AI application (e.g., Kubernetes, AWS) for misconfigurations that could lead to unauthorized access or data breaches.
Input Validation: End to end testing of all data inputs to prevent standard injection attacks (e.g., SQLi, XSS, XML, XPATH, SSRF, etc.) and to ensure data fed to the model adheres to expected schema.

2.2 Specialized AI/LLM Security Testing

This is a critical component that addresses vulnerabilities unique to machine learning models.

2.2.1 Adversarial AI/LLM Learning Attacks

These attacks aim to manipulate the model's behavior or exploit its training data.

Model Evasion (Adversarial Examples): Generating specially crafted inputs to cause the model to make incorrect predictions. Adversarial techniques performed during the inference (testing) phase, where attackers slightly manipulate input data creating "adversarial examples" to cause a trained AI model to misclassify or miss detecting them.
Data/Model Poisoning (Data Integrity): By injecting poisoned data, attackers cause the model to behave in unintended ways, like creating backdoors, misclassified inputs, or reducing overall accuracy. These attacks target the training phase (pre-training or fine-tuning) to manipulate the model's legitimate behavior.
Model Stealing/Extraction: Attempting to query the model to reconstruct an approximation of the original model or to extract sensitive parameters.A model stealing/extraction attack is a security exploit where an adversary queries a target AI model's API to collect input-output data, ultimately training a surrogate model that mimics the original's functionality.
Prompt Injection: Specifically targeting Large Language Models (LLMs) to manipulate the model's output or perform unauthorized actions via malicious input prompts.Attempting to bypass system instructions using direct, indirect, or hallucination-based attacks.
Jail Breaking: Crafting scenarios that trick the AI into bypassing safety filters, often by prompting it to "act as" an unrestricted agent.
Adversarial Suffixes/Token Manipulation: Amending nonsensical crafted inputs or text to prompts that causes the model to ignore safety guidelines.
Gradient based Attacks: Targets models built specially with neural networks.These methods use the mathematical methods of gradient descent that represents the rate of change in a model's prediction relative to changes in input data.

2.2.2 Data and Privacy Attacks

These focus on the sensitive information processed by the AI system.

Membership Inference Attacks: Attempting to determine if a specific data point was part of the model's training dataset and observing the output like a person’s medical record data is leveraged for training.
Model Inversion Attacks: Reverse engineer an AI model to reconstruct the original data and exploit the training data set.
Data Leakage Analysis: Reviewing the system to ensure no sensitive or proprietary data is unintentionally exposed during inference or through logging mechanism.
Property Inference Attacks: Attackers analyse the model output to infer information about the overall distribution of training data, such as determining the percentage of the dataset that belongs to a specific demographic.
Training Data Extraction: For LLMs, prompt manipulation can be used to force the model to reveal memorized training data that is consumed already, such as emails, passwords, or personal details.

3. Demonstration

To demonstrate the strategic approach, we have deployed Netskope in-house tool AI Red Teaming (AIRT) in one of the tenants as a feature flag and enabled it.

3.1 Dashboard Overview

AI Red Teaming (AIRT) is a critical component of the pre-deployment phase for LLM models and applications. It establishes an interface for testing adversarial attacks and evaluating security risks associated with user prompts and responses.

This risk assessment is conducted by measuring the overall attack success rate (comparing successful attacks to blocked attacks) against a predefined set of categories, aligning with the OWASP Top 10 for Large Language Model Applications, to identify and mitigate potential LLM abuses.

3.2 Configuration and Testing

To facilitate the testing of the REST APIs, LLM Model, and OpenAI Compatible API, initial configuration of the Target Registry is mandatory. It is important to note that testing is confined to three specific LLM providers: Open AI, Azure OpenAI, and Amazon Bedrock.

For the purpose of the demonstration, our focus will be on testing the LLM Model and the provider is OpenAI. This necessitates the completion of the form fields with the requisite details, which include:

Target Name
Description
Connection-Type
LLM Provider
API key
Model

RPM (Requests Per Minute)

The Target Registry can also be configured to test both the REST API and the OpenAI Compatible API. The required details should be entered into the form fields, as illustrated in the screenshots below.

After configuring the Target Registry, the dashboard displays its verification status as either "Verified" or "Unverified," depending on the accuracy of the provided configuration details. If the status is "Unverified," as it is in this case, the necessary details must be correctly entered into the form fields to achieve verification.

Likewise, we can configure any number of Target Registry. For the purpose of illustrating a successful attack scenario, we will use those that have been Verified. In this case, we chose Perf Test OpenAI LLM 60rpm for executing the Test Rounds.

Upon the conclusion of the test rounds, the Test Details and overall execution results become visible. This provides a clear illustration of the Overall Attack Success Rate broken down by categories and techniques.

The test results are also presented, detailing which executions succeeded or failed based on the prompts and responses processed by the underlying backend system.

Expanding the Case Details reveals a more granular view of the information, including the specific prompts tested and the recorded responses. This expanded view also clearly illustrates the multi-threaded nature of the prompt/response conversations used whilst scanning.

This demonstration highlights just one of the numerous strategies employed in Netskope's AI Red Teaming. Our security assessments utilize a variety of tools, including open-source, internally developed, and commercial solutions. The assessment methodologies are defined by the scope of the evaluation and the specific abuse cases developed prior to execution, rather than being restricted to particular tools or categories of attacks.

4. Roadmap Ahead

The accelerating pace of Artificial Intelligence (AI) adoption across modern organizations, while promising immense benefits, simultaneously introduces a new and complex vector of inherent security risks. To effectively mitigate these emerging threats, it is no longer sufficient to rely on traditional, generalized cybersecurity measures. Instead, organizations must establish a comprehensive, defense-in-depth, human-in-the-loop and data governance and privacy strategic security program specifically tailored for the AI landscape.

This essential strategic program necessitates a fundamental shift in approach, moving beyond reactive and periodic security reviews. A robust AI security posture requires the implementation of specialized, continuous, and proactive testing strategies. These advanced strategies must encompass the entire AI lifecycle - from data ingestion and model training to deployment and ongoing monitoring - to ensure that vulnerabilities are identified and remediated before they can be exploited. Such a continuous security paradigm is the bedrock upon which an effective and resilient security posture for AI systems must be built.

The fundamental takeaway is that securing AI is an ongoing, multidisciplinary, and dynamic challenge. To foster confidence and prevent harm to their reputation, organizations must shift from a reactive, final-stage view of security to proactively embedding "Security by Design" across the complete AI lifecycle, encompassing everything from initial data curation through to production deployment. At Netskope, we refer to the critical and growing need for proactive security measures against the rapidly evolving landscape of artificial intelligence as the "Rise of AI Red Teaming."

Netskope's Methodology and Approach for Security Testing AI/LLM Applications

1. Phased Approach to AI Security Testing