We aim for 99.9% platform uptime. From 15 to 29 September, uptime was reduced to 99.7% with some customers experiencing slow response times and/or limited access to the platform.
The root cause of the outages was related to Procurify’s database configuration and tuning, which links back to recent changes made by Amazon Web Services (AWS). We have worked closely with AWS and have identified ways to retain the necessary control to tune and maintain our databases independent of changes made by AWS.
The database instances Procurify operates failover to standby replicas in case of availability issues. This process of failing over ensures high availability and happens occasionally. Our database provider, AWS, manages this failover process. During such a failover, the performance configurations of a production database did not transfer fully. This configuration change caused the database to experience performance bottlenecks that resulted in an outage.
We were able to identify the root cause and fix the configuration issue to resume normal operations. Furthermore, we have found a solution to control and apply necessary configuration changes across database failovers without depending on AWS. This change helps with site reliability despite the infrastructure changes in the background required to ensure high availability.
We are investing in broader auditing and enhancements to our platform in order to be more resilient to the dynamic workloads from our customers. We have a strong team today, and we are adding senior talent specializing in database management to deepen our team's strengths.
If you have any questions or concerns, please reach out to our friendly and knowledgeable team of Procurify experts at firstname.lastname@example.org.