menu
close

Google Cloud IAM Failure Cripples Global Internet Services

A critical failure in Google Cloud's Identity and Access Management (IAM) system on June 12, 2025, triggered widespread internet service disruptions worldwide. The outage, caused by a software update with inadequate error handling, affected over 50 Google Cloud services across 40+ regions. Major platforms including Spotify, Discord, OpenAI, and Cloudflare experienced significant downtime, highlighting the growing dependency of critical infrastructure on cloud services.
Google Cloud IAM Failure Cripples Global Internet Services

On June 12, 2025, a seemingly minor policy change in Google Cloud's infrastructure triggered a cascading failure that brought down large portions of the internet for several hours, affecting millions of users and businesses worldwide.

The incident began at 10:51 AM PDT when a policy update containing unintended blank fields was inserted into Google Cloud's regional Spanner databases. This activated dormant code that had been deployed on May 29 but never properly tested. The code, which lacked appropriate error handling and feature flag protection, encountered null values it couldn't process, causing Google's Service Control binaries to crash across multiple regions simultaneously.

The failure specifically impacted Google's Identity and Access Management (IAM) functionality, which is responsible for authorizing requests and determining what actions authenticated users and services can perform. As IAM services failed, the disruption quickly spread to critical cloud components including App Engine, Firestore, Cloud SQL, BigQuery, and Memorystore.

The outage's impact was extensive, affecting both Google's own services and third-party platforms. Google Workspace applications including Gmail, Drive, Docs, and Meet became inaccessible. Major consumer platforms like Spotify (with approximately 46,000 affected users), Discord, Snapchat, and Twitch experienced significant downtime. AI services were particularly hard hit, with OpenAI reporting authentication issues, while AI coding platforms like Cursor and Replit went completely offline.

Google's Site Reliability Engineering team identified the root cause within 10 minutes and began implementing mitigations within 40 minutes. However, full recovery took significantly longer, with some regions (particularly us-central1) experiencing extended outages of up to three hours. The incident officially ended at 20:49 UTC (1:49 PM PDT).

This outage serves as a stark reminder of the internet's growing dependency on cloud infrastructure. As Thomas Kurian, head of Google Cloud, acknowledged: "We regret the disruption this caused our customers." The incident has prompted discussions about the need for more robust error handling, better testing procedures, and diversified cloud dependencies to prevent similar failures in the future.

Source:

Latest News