This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Microsoft AI accidently leaked data while training learning models for GitHub
Microsoft AI – Microsoft’s research division, has accidentally leaked dozens of terabytes of sensitive data while contributing open-source AI learning models to a public GitHub repository.
The leak occurred three years ago but was discovered recently by cloud security firm Wiz in June this year.
Wiz security researchers found that a Microsoft AI employee inadvertently shared the URL for a misconfigured Azure storage bucket containing the data, while publishing open-source training data on GitHub.
The URL had been configured to grant permissions on the entire storage account, exposing additional internal private data by mistake, including private keys and passwords.
Roger Grimes, data-drive defence evangelist for KnowBe4, said that the leak highlighted one of the top risks of AI for enterprises: one of your employees might accidentally share confidential company data.
“It is happening far more than is being reported. To mitigate the risk of it occurring, organisations need to create and publish policies preventing the sharing of organisation confidential data with AI and other external sources and educate users about the risks and how to avoid it,” he said.
“Companies can also use traditional data leak prevention tools and strategies to look for and prevent accidental AI leaks,” he added.
Erfan Shadabi, cybersecurity expert at Comforte AG agreed that misconfigurations which trigger data exposure were “surprisingly common”.
“One way to stop such incidents is to stop depending solely on traditional protection methods such as passwords, border security, and simple data access management,” he advised.
“Data-centric security, which focuses on protecting the data itself, can go a long way toward eliminating the risk inherent in incidents such as this one. By tokenising sensitive data elements, data is made incomprehensible and cannot be leveraged by the wrong person,” he added.
To read more stories on AI, click here
#BeInformed
Subscribe to our Editor's weekly newsletter