Write-up published
Resolved
Related to a previous incident regarding permissions - an increased use of the mass update advanced tool was seen on the morning of Wednesday 9/8/23. While monitoring the situation, it became apparent that the increase was causing slowness that would not resolve itself quickly enough to provide resolution to our customers.
The decision was made to pull FV down for 15 minutes in an attempt to allow the system to recover and ensure performance could be regained in a speedy manner.
Pop-ups were set in the FV platform
The status page was updated
A mass email was sent to the Org Admins of our affected customers
Our internal teams and partner network was alerted of this decision so that they could relay directly to customers.
While the slowness began to resolve itself holistically, a doc gen queue began to increase as a result of our customers submitting 3x the number of doc gen requests when compared to a typical day. Additional resources were deployed to clear this queue. The queue was resolved around 5:00 PM EST.
An internal after-action review will be conducted shortly to discuss efforts to rectify this issue for the future, prevent its recurrence, and identify steps for quicker resolution and better communication to our customers in the future.
Resolved
With our monitoring complete this is resolved, however, additional updates will be made tonight to ensure no additional issues.
Monitoring
Our team has successfully completed the updates needed in our allotted window. All users should be able to access the platform again at this time. We expect to see performance improved for impacted users at this time.
Identified
As we are focusing on deploying solutions regarding this incident, we will need to pause access to this environment for 15 minutes. This will be occurring at 1:00pmEDT/11:00amMDT. We are currently notifying active users in preparation for this step.
Investigating
The team is still investigating the root cause - we have found several paths to improving speed for impacted users and performance should be improved.We will continue to resolve the underlying cause and update on this incident at 10:30am MT.
Investigating
We are currently investigating slowness and page load times within a single US environment. Will update again shortly.