Operational (closely monitoring)
Incident Report for Procurify
Postmortem

Customer Impact
We aim for 99.9% platform uptime. From 15 to 29 September, uptime was reduced to 99.7% with some customers experiencing slow response times and/or limited access to the platform.

Status
Platform Operational.

Resolution
The root cause of the outages was related to Procurify’s database configuration and tuning, which links back to recent changes made by Amazon Web Services (AWS). We have worked closely with AWS and have identified ways to retain the necessary control to tune and maintain our databases independent of changes made by AWS.

The database instances Procurify operates failover to standby replicas in case of availability issues. This process of failing over ensures high availability and happens occasionally. Our database provider, AWS, manages this failover process. During such a failover, the performance configurations of a production database did not transfer fully. This configuration change caused the database to experience performance bottlenecks that resulted in an outage.

We were able to identify the root cause and fix the configuration issue to resume normal operations. Furthermore, we have found a solution to control and apply necessary configuration changes across database failovers without depending on AWS. This change helps with site reliability despite the infrastructure changes in the background required to ensure high availability.

Going Forward
We are investing in broader auditing and enhancements to our platform in order to be more resilient to the dynamic workloads from our customers. We have a strong team today, and we are adding senior talent specializing in database management to deepen our team's strengths.

Contact Information
If you have any questions or concerns, please reach out to our friendly and knowledgeable team of Procurify experts at support@procurify.com.

Posted Sep 30, 2021 - 08:47 PDT

Resolved
This incident has been resolved.
Posted Sep 30, 2021 - 08:45 PDT
Update
Scheduled maintenance complete, team will continue to monitor site stability.
Posted Sep 28, 2021 - 19:04 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 28, 2021 - 12:58 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 28, 2021 - 10:34 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 28, 2021 - 10:33 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 28, 2021 - 09:31 PDT
Investigating
We are currently investigating this issue.
Posted Sep 28, 2021 - 08:12 PDT
Update
We are continuing to monitor for any further issues.
Posted Sep 28, 2021 - 08:11 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 28, 2021 - 04:01 PDT
Investigating
We are currently investigating degraded site performance.
Posted Sep 28, 2021 - 02:11 PDT
This incident affected: Web Application.