Data used to develop Microsoft AI algorithms for GitHub is leaked.

Microsoft AI – Microsoft’s research division, has accidentally leaked dozens of terabytes of sensitive data while contributing open-source AI learning models to a public GitHub repository.

The leak occurred three years ago but was discovered recently by cloud security firm Wiz in June this year.

Wiz security researchers found that a Microsoft AI employee inadvertently shared the URL for a misconfigured Azure storage bucket containing the data, while publishing open-source training data on GitHub.

The URL had been configured to grant permissions on the entire storage account, exposing additional internal private data by mistake, including private keys and passwords.

Roger Grimes, data-drive defence evangelist for KnowBe4, said that the leak highlighted one of the top risks of AI for enterprises: one of your employees might accidentally share confidential company data.

“It is happening far more than is being reported. To mitigate the risk of it occurring, organisations need to create and publish policies preventing the sharing of organisation confidential data with AI and other external sources and educate users about the risks and how to avoid it,” he said.

“Companies can also use traditional data leak prevention tools and strategies to look for and prevent accidental AI leaks,” he added.

Erfan Shadabi, cybersecurity expert at Comforte AG agreed that misconfigurations which trigger data exposure were “surprisingly common”.

“One way to stop such incidents is to stop depending solely on traditional protection methods such as passwords, border security, and simple data access management,” he advised.

“Data-centric security, which focuses on protecting the data itself, can go a long way toward eliminating the risk inherent in incidents such as this one. By tokenising sensitive data elements, data is made incomprehensible and cannot be leveraged by the wrong person,” he added.

To read more stories on AI, click here

Microsoft AI accidently leaked data while training learning models for GitHub

IBM study puts a profit number on AI vendor dependency

SEP2 brings Gemini Enterprise agents into managed threat monitoring

Microsoft playbook makes AI incident response a telemetry problem

Slalom’s ‘joy mapping’ fix for failed AI rollouts

AI agents need security tests based on actions, not answers

Unilever moves AI digital twins from factory deployments toward process control