Microsoft has revealed lessons its red team has learned after spending seven years ‘attacking’ GenAI products and identifying potential harms.
In 2018, Microsoft formed the “red team” to address the increasing threat of AI security risks and harms.
Since then, the team has tackled more than 100 GenAI products, including copilots, plugins, AI models, and apps.
They modelled components of a cyberattack including adversarial or benign actors, TTPs (Tactics, Techniques, and Procedures), system weaknesses, and downstream impacts.
It came back with three takeaways: GenAI systems amplify existing security risks and introduce new ones; humans are at the centre of improving and securing AI, and defence against attacks needs to continuously update to account for novel harms.
Microsoft discovered dangers often arose from poor practices like outdated dependencies and weak error handling, which can lead to vulnerabilities.
However, the report also added that new threats were growing, for instance, where an attacker may craft a prompt to an LLM that includes hidden instructions to trick the model into executing unintended actions or revealing sensitive information.
To tackle these, Microsoft says that while automation tools are useful, the red team itself could not yet be replaced by AI as it relies “heavily on human expertise,” particularly for harmful chatbot responses.
For instance, while LLMs can evaluate whether an AI model response contains hate speech, the firm said they were not as reliable at assessing content in specialised areas such as medicine, science, or cybersecurity.
Microsoft believes that the findings also highlight that cultural competence and emotional intelligence are vital cybersecurity skills.
Ultimately, Microsoft stressed that cyber defence must evolve with emerging threats and AI red teams must continuously update defences to address new vulnerabilities and invest in robust measurement and mitigation techniques.
