System Outage
Incident Report for Multiplier
Postmortem

Between 8pm PST on 24 September, 2024 and 6am on 25 September, 2025 there was an outage that caused Multiplier's app to be inaccessible. The incident was triggered by the RDS database running out of space, which led to a failure in database connections.

Consequently, authentication requests failed because the application server was unable to retrieve the information associated with the clientKey contained in the authentication token, thereby preventing the validation of requests.

Incident Detection

We have automated end-to-end tests in place. These tests detected the issue and sent alerts to OpsGenie. However, due to a misconfiguration on our phone, which was set to "Do Not Disturb," OpsGenie was unable to alert us during the night, resulting in an unacceptable response time.

Actions Taken

We are implementing the following actions to ensure that this situation does not recur.

1. Review and Adapt Alerts for Internal Components

Alerts for internal components will be reviewed and adapted, as foreseeable issues could and should be prevented proactively.

2. Weekly Testing of Alert Mechanisms

Weekly testing of the alerting mechanisms will be put in place. A misconfiguration of the alerting software is foreseeable and should have been detected prior to the incident. Early notification could have reduced the incident's duration to a few minutes.

3. Team Training for Emergency Situations

The entire team will be trained to handle emergency situations. Monthly emergency training meetings will be scheduled to update procedures as needed.This action plan will ensure that the team is better prepared to detect and respond efficiently to any similar incidents in the future.

Posted Sep 25, 2024 - 15:21 PDT

Resolved
Between 8pm PST on 24 September, 2024 and 6am on 25 September, 2025 there was an outage that caused Multiplier's app to be inaccessible.

The incident was triggered by the RDS database running out of space, which led to a failure in database connections. Consequently, authentication requests failed because the application server was unable to retrieve the information associated with the clientKey contained in the authentication token, thereby preventing the validation of requests.
Posted Sep 24, 2024 - 20:00 PDT