Service Fabric 8.2 Standalone 3-Node Cluster: Rolling Certificate Upgrade Fails

Andrew Crowley 1 Reputation point
2022-08-02T19:57:20.357+00:00

I am attempting to perform a rolling certificate update (aka cluster configuration upgrade) on my 3-node standalone cluster but it fails every time.

Service Fabric Runtime Version: 8.2.1486.9590
OS: Windows Server 2019 Datacenter
I start with a deployed cluster with one primary X509 certificate thumbprint.
I have a powershell script (provided below, ran as powershell administrator) I wrote that will create a new cluster config file. The file denotes the old certificate as secondary, while making the newly provided certificate thumbprint primary.
Both certificates I am working with are CA issued certificates installed into the "Trusted Root Certification Authorities" store on each node.
The script will run this command:

Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath 'path-to-new-config.json'

And then periodically the script will call Get-ServiceFabricClusterConfiguration and show the status of the rolling upgrade.

Eventually the process will get stuck. Get-ServiceFabricClusterConfiguration continuously returns an exception:

   Get-ServiceFabricClusterUpgrade : FABRIC_E_SERVER_AUTHENTICATION_FAILED: 0x80092012  

Furthermore, In the Service Fabric Explorer window, the Image Store Service becomes stuck in Quorum loss and will not recover.

Here are some error messages.

   'System.FM' reported Error for property 'State'.  
   Partition is in quorum loss. As the replicas come up, partition should recover from the quorum loss. Service Fabric will force recover partition from the quorum loss after QuorumLossWaitDuration (TimeSpan: infinite) expires.  
   If the partition has been in this state for more than expected time then please refer to the troubleshooting guide.  



   'System.RA' reported Warning for property 'ReplicaOpenStatus'.  
   Replica had multiple failures during open on 2019-dev-05. -2147017660  

I will be happy to provide any more information if needed. Thank you in advance for helping!!

Update 1:

I should note that I have previously been using Service Fabric runtime version 6.5.676.9590, and the cluster configuration upgrade is successful every time if I follow the exact same procedure (which is the powershell script).

Update 2:

Perhaps I may ask a different question: What is the correct procedure for upgrading cluster certificates in newer versions of Service Fabric? Perhaps my current procedure is wrong.

Update 3:
So I was curious to see what happened if I installed the latest available version of SF (which at the time of this writing is 9.0.1048.9590). I followed the same procedure how I normally would. Turns out that this worked! So with this knowledge, there may be a bug in SF version 8.2.

#'Make sure the new certificate is installed on all machines before running this script'  
#'This script is meant to be run locally on a machine where service fabric is present'  
  
param(  
    [Parameter(Mandatory = $true)]  
    [String] $clusterCertThumbprint,  
    [Parameter(Mandatory = $true)]  
    [String] $serverCertThumbprint,  
    [Parameter(Mandatory = $true)]  
    [String] $clientCertThumbprint  
)  
  
$clusterCertOnMachine = Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Thumbprint -eq $clusterCertThumbprint}  
$serverCertOnMachine = Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Thumbprint -eq $serverCertThumbprint}  
$clientCertOnMachine = Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Thumbprint -eq $clientCertThumbprint}  
  
if (!($clusterCertOnMachine -match "Thumbprint")){  
 "Given Cluster Certificate does not exist on the current machine"  
 exit  
}  
  
if (!($serverCertOnMachine -match "Thumbprint")){  
 "Given Server Certificate does not exist on the current machine"  
 exit  
}  
  
if (!($clientCertOnMachine  -match "Thumbprint")){  
 "Given Client Certificate does not exist on the current machine"  
 exit  
}  
  
Connect-ServiceFabricCluster  
$config = Get-ServiceFabricClusterConfiguration | ConvertFrom-Json  
$removeSecondaries = $false;  
  
#Check if secondary certs already exist  
if (($config.Properties.Security.CertificateInformation.ClusterCertificate -match "ThumbprintSecondary")){  
 $removeSecondaries = $true;  
}   
if (($config.Properties.Security.CertificateInformation.ServerCertificate -match "ThumbprintSecondary")){  
 $removeSecondaries = $true;  
}  
  
if ($removeSecondaries) {  
    $config.Properties.Security.CertificateInformation.ClusterCertificate = $config.Properties.Security.CertificateInformation.ClusterCertificate | Select-Object -Property * -ExcludeProperty ThumbprintSecondary  
    $config.Properties.Security.CertificateInformation.ServerCertificate = $config.Properties.Security.CertificateInformation.ServerCertificate | Select-Object -Property * -ExcludeProperty ThumbprintSecondary  
  
    #update config version  
    $version = ($config.ClusterConfigurationVersion | Select -First 1).Split(".")  
    $version[2] = [int]$version[2] + 1  
    $version = $version -join "."  
    $config.ClusterConfigurationVersion = $version  
  
    #get rid of junk PS added  
    $config.Properties.Security = $config.Properties.Security | Select-Object -Property * -ExcludeProperty WindowsIdentities  
    $config | ConvertTo-Json  -Depth 20 | Out-File 'c:\fabricConfig.json' -Force  
    Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath 'c:\fabricConfig.json'  
 ####### Update-ServiceFabricClusterUpgrade -UpgradeReplicaSetCheckTimeoutSec 30  
    $serviceFabricUpdated = $false  
    "Old secondary Certificates need to be removed staring rolling update."  
    start-sleep -Seconds 30  
    do {  
     $progress = Get-ServiceFabricClusterUpgrade  
     "waiting for service fabric update to complete"  
     $progress  
     if ($progress.UpgradeState -eq "RollingForwardCompleted") {  
     $configUpdated = Get-ServiceFabricClusterConfiguration | ConvertFrom-Json  
     if ($configUpdated.ClusterConfigurationVersion -eq $version) {  
     "Service Fabric Certificate Rollover completed"  
     $serviceFabricUpdated = $true  
     }  
     }  
 start-sleep -Seconds 30  
    } while(!($serviceFabricUpdated))  
}  
  
$config = Get-ServiceFabricClusterConfiguration | ConvertFrom-Json  
  
if (!($config.Properties.Security.CertificateInformation.ClusterCertificate -match "ThumbprintSecondary")){  
 $config.Properties.Security.CertificateInformation.ClusterCertificate | Add-Member -NotePropertyName thumbprintSecondary -NotePropertyValue 0  
}  
  
if (!($config.Properties.Security.CertificateInformation.ServerCertificate -match "ThumbprintSecondary")){  
 $config.Properties.Security.CertificateInformation.ServerCertificate | Add-Member -NotePropertyName thumbprintSecondary -NotePropertyValue 0  
}  
  
#update cluster certificate  
$config.Properties.Security.CertificateInformation.ClusterCertificate.thumbprintSecondary = $config.Properties.Security.CertificateInformation.ClusterCertificate.Thumbprint  
$config.Properties.Security.CertificateInformation.ClusterCertificate.Thumbprint = $clusterCertThumbprint  
  
#update server certificate  
$config.Properties.Security.CertificateInformation.ServerCertificate.thumbprintSecondary = $config.Properties.Security.CertificateInformation.ServerCertificate.Thumbprint  
$config.Properties.Security.CertificateInformation.ServerCertificate.Thumbprint = $serverCertThumbprint  
  
#update ClientCertificateThumbprints  
$newCert = New-Object -TypeName psobject  
$newCert | Add-Member -MemberType NoteProperty -Name CertificateThumbprint -Value $clientCertThumbprint  
$newCert | Add-Member -MemberType NoteProperty -Name IsAdmin -Value $true  
$config.Properties.Security.CertificateInformation.ClientCertificateThumbprints += $newCert  
  
#update config version  
$version = ($config.ClusterConfigurationVersion | Select -First 1).Split(".")  
$version[2] = [int]$version[2] + 1  
$version = $version -join "."  
$config.ClusterConfigurationVersion = $version  
  
#get rid of junk PS added  
$config.Properties.Security = $config.Properties.Security | Select-Object -Property * -ExcludeProperty WindowsIdentities  
  
$config | ConvertTo-Json  -Depth 20 | Out-File 'c:\fabricConfig.json' -Force  
  
Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath 'c:\fabricConfig.json'  
$serviceFabricUpdated = $false  
"Starting rolling update for new certificates"  
start-sleep -Seconds 30  
do {  
 $progress = Get-ServiceFabricClusterUpgrade  
 "waiting for service fabric update to complete. Navigate to service farbric webpage to monitor status of the update."  
 $progress  
 if ($progress.UpgradeState -eq "RollingForwardCompleted") {  
 $configUpdated = Get-ServiceFabricClusterConfiguration | ConvertFrom-Json  
 if ($configUpdated.ClusterConfigurationVersion -eq $version) {  
 "Service Fabric Certificate Rollover completed"  
 $serviceFabricUpdated = $true  
 }  
 }  
 elseif ($progress.UpgradeState -eq "RollingBackCompleted") {  
 "Something went wrong, certiticate rollover process was forced to rollback"  
 $serviceFabricUpdated = $true  
 }  
 start-sleep -Seconds 30  
} while(!($serviceFabricUpdated))  
  
Azure Service Fabric
Azure Service Fabric
An Azure service that is used to develop microservices and orchestrate containers on Windows and Linux.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,121 Reputation points Volunteer Moderator
    2025-11-03T13:00:19.12+00:00

    Hello Andrew !

    Thank you for posting on Microsoft Learn Q&A.

    What I tried so far is to temporarily relax CRL enforcement during the upgrade and in the security section I added IgnoreCrlOfflineError = trueor CrlCheckingFlag = 0x80000000 since default is full chain revocation (0x40000000). Just add one of them and what I understood that these exist exactly to keep clusters available when PKI endpoints are unreachable.

    https://dori-uw-1.kuma-moon.com/en-us/azure/service-fabric/cluster-security-certificates

    and I found out also when doing the swap I was pointing to the wrong server cert :

    Connect-ServiceFabricCluster -ServerCertificateThumbprint <new> -ServerCommonName <CN-if-used>

    https://github.com/microsoft/service-fabric/issues/1142

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.