Azure to Azure Stack site-to-site IPSec VPN tunnel failure... after 8 hours
We had a need to create a site-to-site VPN tunnel for a POC from Azure Stack to Azure. It seemed pretty straight forward. Spoiler alert, obviously I'm writing this because it wasn't. The tunnel was created okay, but each morning it would no longer allow traffic to travel across it. The tunnel would show connected in Azure and in Azure Stack but traffic just wouldn't flow; ping, SSH, RDP, DNS and AD all wouldn't work. After some tinkering we found we would have to change the connection's sharedkey value to something random, save it, then change it back to the correct key. This only worked from the Azure Stack side of the connection, to re-initiate successfully and allow traffic to flow again (or recreate the connection from scratch). It would work for another 8 hours and then fail to pass traffic again.
My suspicion was the re-keying, as this would explain why it worked at first and would fail the next day (everyday, for the last 5 days). I tried using VPN diagnostics on the Azure side, as they don't currently support VPN diagnostics on Azure Stack (we are on update 1805). After reviewing the IKE log there were some errors, but it was hard to find something to tell me what was going wrong, more specifically something I could do to fix it. Below is the IKE log file I collected through the VPN diagnostics from Azure.
I logged a case with Microsoft support. The first support person did their best. While Microsoft can identify the endpoints they are connecting to, from Azure, they do not have permission to dig any deeper and look into the contents of our subscriptions hosted on Azure Stack. I was asked to change the local VPN gateways from specific subnets to be the entire vnet address space. While it worked initially, again it failed after 8 hours.
The support engineer collected some network traffic and other logs and forwarded the case to an Azure Stack support engineer. Once the call was assigned they asked me to connect to the privileged endpoint (PEP) and we proceeded with breaking the glass to Azure Stack to troubleshoot. The engineer gave me a few PowerShell commands to run to investigate what was going on.
#First find out which of the VPN gateways is active. icm Azs-gwy01,Azs-gwy02 { get-vpns2sinterface }
#Check Quick Mode Key Exchange icm Azs-gwy01 { get-netIpsecQuickModeSA }
#Check Main Mode Key Exchange icm Azs-gwy01 { get-netIpsecMainModeSA }
The Microsoft engineer had a hunch of exactly what he was looking for and was on point. The commands showed that the Quick mode key exchange had failed to complete the refresh, yet the Main Mode had succeeded. This explained why the tunnel was up but no traffic could flow across it.
We rebooted the active VPN gateway so the tunnels would fail-over to the second gateway. Logging was on by default so we just had to wait for the next timeout to occur. When it did I was given the task of collecting and uploading the logs from the PEP.
These logs are a series of ETL files that need to be processed by Microsoft to make sense of them. Fortunately it turned up the following log entries.
As commented above, the root cause was that the PFS and CipherType setting were incorrect on the Azure VPN gateway. I was given a few PowerShell commands to run against the Azure Subscription to reconfigure the IPSec policy for the connection on the Azure side to match the policy of the VPN gateway and connection on Azure Stack.
$RG1 = 'RESOURCE GROUP NAME' $CONN = 'CONNECTION NAME' $GWYCONN = Get-AzureRmVirtualNetworkGatewayConnection -Name $CONN -ResourceGroupName $RG1 $newpolicy = New-AzureRmIpsecPolicy -IkeEncryption AES256 -IkeIntegrity SHA256 -DhGroup DHGroup2 -IpsecEncryption GCMAES256 -IpsecIntegrity GCMAES256 -PfsGroup PFS2048 -SALifeTimeSeconds 27000 -SADataSizeKilobytes 33553408 Set-AzureRmVirtualNetworkGatewayConnection -VirtualNetworkGatewayConnection $GWYCONN -IpsecPolicies $newpolicy
Almost there. When I tried to run the command, the basic Sku doesn't allow for custom IPSec policies. Once I changed the Sku from basic to standard the command worked and the tunnel has been up and stable.
While this is any easy fix that anyone can run against their Azure subscription without opening a support ticket, this does incur a cost difference. Hopefully in the future these policies will match out-of-the-box between Azure and Azure Stack so every consumer can use the basic VPN Sku to connect Azure Stack to Azure over a secure tunnel.
Topic Search
-
Securing TLS in WAC (Windows Admin Center) https://t.co/klDc7J7R4G
Posts by Date
- March 2025 1
- February 2025 1
- October 2024 1
- August 2024 1
- July 2024 1
- October 2023 1
- September 2023 1
- August 2023 3
- July 2023 1
- June 2023 2
- May 2023 1
- February 2023 3
- January 2023 1
- December 2022 1
- November 2022 3
- October 2022 7
- September 2022 2
- August 2022 4
- July 2022 1
- February 2022 2
- January 2022 1
- October 2021 1
- June 2021 2
- February 2021 1
- December 2020 2
- November 2020 2
- October 2020 1
- September 2020 1
- August 2020 1
- June 2020 1
- May 2020 2
- March 2020 1
- January 2020 2
- December 2019 2
- November 2019 1
- October 2019 7
- June 2019 2
- March 2019 2
- February 2019 1
- December 2018 3
- November 2018 1
- October 2018 4
- September 2018 6
- August 2018 1
- June 2018 1
- April 2018 2
- March 2018 1
- February 2018 3
- January 2018 2
- August 2017 5
- June 2017 2
- May 2017 3
- March 2017 4
- February 2017 4
- December 2016 1
- November 2016 3
- October 2016 3
- September 2016 5
- August 2016 11
- July 2016 13