Skip to main content
Question

Inquiry regarding NLP-based attacks in AI Red Teaming and Guardrail recommendations.

  • January 29, 2026
  • 0 replies
  • 26 views

Hi there,

I am currently exploring the AI Guardrails and Red Teaming features.

I've noticed that AI Guardrails allows for natural language-based configurations through the "Matched Prompts / Responses" feature. However, it seems that AI Red Teaming does not yet support performing attacks based on LLM natural language.

Regarding this, I have two questions:

  1. Are there any plans to provide natural language-based or custom prompt attack capabilities within AI Red Teaming?

  2. Separately, is there a roadmap for a feature that suggests specific Guardrail configurations based on the "Failed" items in AI Red Teaming test results?

I believe these features would greatly enhance the synergy between testing and mitigation. I look forward to your feedback.

Best regards,