Share via

Azure Database for PostgreSQL Flexible Server (Australia East) has been stuck in "Updating" state for over 4 hours since approximately 08:35 UTC on April 6, 2026. No u

Mohamed Abdelaziz 0 Reputation points
2026-04-06T17:37:53.92+00:00

My production Azure Database for PostgreSQL Flexible Server has been stuck in "Updating" state for over 4 hours and I cannot perform any operations on it.

Resource details:

  • Region: Australia East
  • SKU: Burstable B2s
  • PostgreSQL version: 17

What happened:

  1. Server had CPU credits fully exhausted (only ~2 credits remaining from midnight)
  2. Azure Insights reported 100% CPU at 08:35 UTC on April 6, 2026
  3. Server became unreachable — all connections refused on port 5432
  4. Azure entered the server into Updating state — likely an internal recovery/maintenance operation
  5. The server has been updating for 4+ hours with no progress

What I tried:

  • restartServerIsNotReady
  • stopServerIsNotReady
  • parameter setServerIsBusy
  • restore to new server → InternalServerError
  • delete restored server → succeeded, but the original is still stuck

Azure Service Health previously flagged an active Critical incident:

"Your Azure Database for PostgreSQL - Flexible server is currently unavailable. We are working to resolve the problem."

Questions:

  1. Is there any way to force-cancel or roll back the stuck internal Updating operation?
  2. Can a restore be initiated while the source server is in Updating state?
  3. How long should a WAL recovery operation take on a Burstable B2s with depleted CPU credits?
  4. Is there a way to escalate this without a paid support plan?My production Azure Database for PostgreSQL Flexible Server has been stuck in "Updating" state for over 4 hours and I cannot perform any operations on it. Resource details:
      • Region: Australia East
      • SKU: Burstable B2s
        • PostgreSQL version: 17
    What happened:
    1. Server had CPU credits fully exhausted (only ~2 credits remaining from midnight)
    2. Azure Insights reported 100% CPU at 08:35 UTC on April 6, 2026
    3. Server became unreachable — all connections refused on port 5432
    4. Azure entered the server into Updating state — likely an internal recovery/maintenance operation
    5. The server has been Updating for 4+ hours with no progress
    What I tried:
    • restart → ServerIsNotReady
    • stop → ServerIsNotReady
    • parameter set → ServerIsBusy
    • restore to new server → InternalServerError 
    • delete restored server → succeeded, but original still stuck
    Azure Service Health previously flagged an active Critical incident:

    "Your Azure Database for PostgreSQL - Flexible server is currently unavailable. We are working to resolve the problem."

    Questions:
    1. Is there any way to force-cancel or roll back the stuck internal Updating operation?
    2. Can a restore be initiated while the source server is in Updating state?
    3. How long should a WAL recovery operation take on a Burstable B2s with depleted CPU credits?
    4. Is there a way to escalate this without a paid support plan?
Azure Database for PostgreSQL

2 answers

Sort by: Most helpful
  1. Fabio 0 Reputation points
    2026-04-06T22:14:45.7633333+00:00

    I have the same problem.
    My db is down for several hour with the message
    "Your Azure Database for PostgreSQL - Flexible server is currently unavailable. We are working to resolve the problem."

    Any idea? Thank you

    0 comments No comments

  2. Q&A Assist
    2026-04-06T17:38:17.08+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more
    1. There is no supported way to force‑cancel or roll back an internal Updating operation on an Azure Database for PostgreSQL flexible server. When the platform is performing recovery, maintenance, or failover actions, user operations such as restart, stop, parameter changes, and delete are blocked until the workflow completes or fails. For high availability scenarios, the platform’s failover workflow is fully managed and includes steps such as promoting a standby and updating DNS, with application downtime during a subset of those steps only, but it cannot be interrupted or overridden by the customer.
    2. Restore operations are platform‑controlled and rely on backups and WAL stored by the service. The context only documents that:
      • Point‑in‑time restore creates a new server in the same region, with the same major PostgreSQL version as the source.
      • For dropped servers, restore is possible for up to five days after drop, using createMode = ReviveDropped and the original server’s resource ID.
      • Restore requests can fail with errors (for example, wrong API version, DNS issues, or internal errors) and must then be retried.
      The documentation does not state that restore is supported or guaranteed while the source server is in an Updating state, and the InternalServerError on restore indicates that the platform was not able to complete the operation in this condition. The only supported actions are to wait for the platform workflow to complete or fail, and then retry restore or other operations.
    3. The documentation does not provide any SLA or expected duration for WAL recovery or internal recovery operations on a specific SKU (such as Burstable B2s) or under depleted CPU credits. Recovery time depends on internal factors such as the amount of WAL to replay, size of the data, and the nature of the failure. Only general guidance is provided for unplanned downtime: the service automatically performs recovery and, if HA is configured, promotes the standby; if not, it provisions a new server and performs recovery, with no user control over timing.
    4. For escalation without a paid support plan, the documented mechanisms are:
      • Rely on Azure Service Health and Resource Health to track active service issues and outages affecting the PostgreSQL flexible server instance and region. When the service declares an outage, details and impact are visible via:
        • Azure portal notifications (Service Issue banner).
        • Help + support / Support + troubleshooting → View outage details.
        • Service Health → Service issues and Resource health for the specific resource.
      • When a service outage is active, the platform is already treating it as a high‑priority incident. The documentation does not describe any additional escalation path beyond opening a support case (which typically requires a support plan) and monitoring Service Health/Resource Health while the platform works to resolve the issue.

    In this scenario, the only documented, supported actions are:

    • Monitor Azure Service Health and Resource Health for the server and region to track the ongoing incident and resolution.
    • After the platform completes or fails the Updating workflow and the outage is cleared, retry point‑in‑time restore to a new server (ensuring correct API version and a different server name if needed).

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.