Proton worldwide outage caused by Kubernetes migration, software change
Swiss tech company Proton, which provides privacy-focused online services, says that a Thursday worldwide outage was caused by an ongoing infrastructure migration to Kubernetes and a software change that triggered an initial load spike.
As the company revealed yesterday in an incident report published on its status page, the outage started around 10:00 AM ET.
Proton users reported that they couldn't connect to their Proton VPN, Proton Mail, Proton Calendar, Proton Drive, Proton Pass, and Proton Wallet accounts.
For instance, when attempting to connect to Proton Mail, those affected saw error messages stating, "Something went wrong. We couldn't load this page. Please refresh the page or check your internet connection."
The issues were fully resolved within about two hours, with Proton Mail and Proton Calendar being the last services brought back online.
"As of 16:15 CET, all services other than Mail and Calendar are operating normally. We are still working on fixing the issue and restoring the rest of the affected services," the company said.
Today, in an update to the original incident report, Proton revealed that yesterday's global outage was triggered by a software change identified by the site reliability engineering team.
The change severely limited the number of new connections to Proton's database servers, causing an initial load spike when the number of users connecting increased sharply around 4 PM Zurich.
"This overloaded Proton's infrastructure, and made it impossible for us to serve all customer connections. While Proton VPN, Proton Pass, Proton Drive/Docs, and Proton Wallet were recovered quickly, issues persisted for longer on Proton Mail and Proton Calendar," the company said.
"For those services, during the incident, approximately 50% of requests failed, leading to intermittent service unavailability for some users (the service would look to be alternating between up and down from minute to minute)."
While Proton would have had enough extra capacity to handle all the new connections, an ongoing migration to Kubernetes, which required running "two parallel infrastructures at the same time," made it impossible to balance the load.
"In total, it took us approximately 2 hours to get back to the state where we could service 100% of requests, with users experiencing degraded performance until then. The service was available, but only intermittently, with performance being substantially improved during the second hour of the incident, but requiring an additional hour to fully resolve," Proton added.
Proton says it has since resolved all connection issues affecting its online services and is currently monitoring for additional issues even though "the situation has been stable for some time."
source: BleepingComputer
Free online web security scanner
Top News:
Ivanti Flaw CVE-2025-0282 Actively Exploited, Impacts Connect Secure and Policy Secure
January 9, 2025Windows Server 2025 released—here are the new features
November 5, 2024Thousands of BeyondTrust Systems Remain Exposed
January 4, 2025Cryptocurrency wallet drainers stole $494 million in 2024
January 5, 2025