Storage issue with RDS
Incident Report for Assignar
Postmortem

We apologise for Intermittent issues

At Assignar, we do our very best to ensure that our customers don’t experience any service interruptions. Unfortunately, we had some issues with one of our RDS instances that prevented some users to log into Assignar and view certain pages in the dashboard and the mobile app.

For that, we are sincerely sorry.

What Happened?

After taking a deep look at the intermittent database issues, we identified that one of our RDS instances was running low on FreeLocalStorage.

Instances in our database clusters have two types of storage:

Storage for persistent data (called the cluster volume). This storage type increases automatically when more space is required.

Local storage for each instance in the cluster, based on the instance class. This storage type and size is bound to the instance class, and can be changed only by moving to a larger DB instance class. Our database clusters use local storage for storing error logs, general logs, slow query logs, audit logs, and non-InnoDB temporary tables.

The following error was identified: "The free storage capacity for DB Instance: instance-name is low at x% of the provisioned storage [Provisioned Storage: xx GB, Free Storage: xx GB]. You may want to increase the provisioned storage to address this issue."

We increased the provisioned storage to and the problem was rectified.

Remediation plan

We have a number of alarms in place to prevent incidents like this to occur. Unfortunately, we didn’t have any alarms for FreeLocalStorage readings. We have put in place appropriate alerts and alarms so that we get plenty of notice when such problems are likely to occur. We have also fine tuned our audit logging, which should consume a lot less local storage on the database instance, hence slow down the consumption of FreeLocalStorage on the RDS instance.

Posted Apr 11, 2019 - 16:21 AEST

Resolved
We had an issue with one of our Database instances that was causing some clients to not be able to log into the system. We identified an issue and the database was running low on FreeLocalStorage due to some complex SQL queries that were being executed. Also our extensive Audit Logs have added to this issue. We have provisioned more storage to this database instance and have optimised our logging to consume less local storage on the database instance.
Posted Apr 09, 2019 - 15:30 AEST