Hi there,
I am currently exploring the AI Guardrails and Red Teaming features.
I've noticed that AI Guardrails allows for natural language-based configurations through the "Matched Prompts / Responses" feature. However, it seems that AI Red Teaming does not yet support performing attacks based on LLM natural language.
Regarding this, I have two questions:
-
Are there any plans to provide natural language-based or custom prompt attack capabilities within AI Red Teaming?
-
Separately, is there a roadmap for a feature that suggests specific Guardrail configurations based on the "Failed" items in AI Red Teaming test results?
I believe these features would greatly enhance the synergy between testing and mitigation. I look forward to your feedback.
Best regards,




