Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / Advancing no-impact and low-impact maintenance technologies

Advancing no-impact and low-impact maintenance technologies

January 3, 2020 by cbn Leave a Comment

“This post continues our reliability series kicked off by my July blog post highlighting several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services. Today I wanted to double-click on the investments we’ve made in no-impact and low-impact update technologies including hot patching, memory-preserving maintenance, and live migration. We’ve deployed dozens of security and reliability patches to host infrastructure in the past year, many of which were implemented with no customer impact or downtime. The post that follows was written by John Slack from our core operating systems team, who is the Program Manager for several of the update technologies discussed below.” – Mark Russinovich, CTO, Azure


This post was co-authored by Apurva Thanky, Cristina del Amo Casado, and Shantanu Srivastava from the engineering teams responsible for these technologies.

We regularly update Azure host infrastructure to improve the reliability, performance, and security of the platform. While the purposes of these ‘maintenance’ updates vary, they typically involve updating software components in the hosting environment or decommissioning hardware. If we go back five years, the only way to apply some of these updates was by fully rebooting the entire host. This approach took customer virtual machines (VMs) down for minutes at a time. Since then, we have invested in a variety of technologies to minimize customer impact when updating the fleet. Today, the vast majority of updates to the host operating system are deployed in place with absolute transparency and zero customer impact using hot patching. In infrequent cases in which the update cannot be hot patched, we typically utilize low-impact memory preserving update technologies to roll out the update.

Even with these technologies, there are still other rare cases in which we need to do more impactful maintenance (including evacuating faulty hardware or decommissioning old hardware). In such cases, we use a combination of live migration, in-VM notifications, and planned maintenance providing customer controls.

Thanks to continued investments in this space, we are at a point where the vast majority of host maintenance activities do not impact the VMs hosted on the affected infrastructure. We’re writing this post to be transparent about the different techniques that we use to ensure that Azure updates are minimally impactful.

Plan A: Hot patching

Function-level “hot” patching provides the ability to make targeted changes to running code without incurring any downtime for customer VMs. It does this by redirecting all new invocations of a function on the host to an updated version of that function, so it is considered a ‘no impact’ update technology. Wherever possible we use hot patching to apply host updates completely avoiding any impact to the VMs running on that host. We have been using hot patching in Azure since 2017. Since then, we have worked to broaden the scope of what we can hot patch. As an example, we updated the host operating system to allow the hypervisor to be hot patched in 2018. Looking forward, we are exploring firmware hot patches. This is a place where the industry typically hasn’t focused. Firmware has always been viewed as ‘if you need to update it, reboot the server,’ but we know that makes for a terrible customer experience. We’ve been working with hardware manufacturers to consider our own firmware to make them hot patchable and incrementally updatable.

Some large host updates contain changes that cannot be applied using function-level hot patching. For those updates, we endeavor to use memory-preserving maintenance.

Plan B: Memory-preserving maintenance

Memory-preserving maintenance involves ‘pausing’ the guest VMs (while preserving their memory in RAM), updating the host server, then resuming the VMs and automatically synchronizing their clocks. We first used memory-preserving maintenance for Azure in 2018. Since then we have improved the technology in three important ways. First, we have developed less impactful variants of memory-preserving maintenance targeted for host components that can be serviced without a host reboot. Second, we have reduced the duration of the customer experienced pause. Third, we have expanded the number of VM types that can be updated with memory preserving maintenance. While we continue to work in this space, some variants of memory-preserving maintenance are still incompatible with some specialized VM offerings like M, N, or H series VMs for a variety of technical reasons.

In the rare case we need to make more impactful maintenance (including host reboots, VM redeployment), customers are notified in advance and given the opportunity to perform the maintenance at a time suitable for their workload(s).

Plan C: Self-service maintenance

Self-service maintenance involves providing customers and partners a window of time, within which they can choose when to initiate impactful maintenance on their VM(s). This initial self-service phase typically lasts around a month and empowers organizations to perform the maintenance on their own schedules so it has no or minimal disruption to users. At the end of this self-service window, a scheduled maintenance phase begins—this is where Azure will perform the maintenance automatically. Throughout both phases, customers get full visibility of which VMs have or have not been updated—in Azure Service Health or by querying in PowerShell/CLI. Azure first offered self-service maintenance in 2018. We generally see that administrators take advantage of the self-service phase rather than wait for Azure to perform maintenance on their VMs automatically.

In addition to this, when the customer owns the full host machine, either using Azure Dedicated Hosts or Isolated virtual machines, we recently started to offer maintenance control over all non-zero impact platform updates. This includes rebootless updates which only cause a few seconds pause. It is useful for VMs running ultra-sensitive workloads which can’t sustain any interruption even if it lasts just for a few seconds. Customers can choose when to apply these non-zero impact updates in a 35-day rolling window. This feature is in public preview, and more information can be found in this dedicated blog post.

Sometimes in-place update technologies aren’t viable, like when a host shows signs of hardware degradation. In such cases, the best option is to initiate a move of the VM to another host, either through customer control via planned maintenance or through live migration.

Plan D: Live migration

Live migration involves moving a running customer VM from one “source” host to another “destination” host. Live migration starts by moving the VM’s local state (including RAM and local storage) from the source to the destination while the virtual machine is still running. Once most of the local state is moved, the guest VM experiences a short pause usually lasting five seconds or less. After that pause, the VM resumes running on the destination host. Azure first started using live migration for maintenance in 2018. Today, when Azure Machine Learning algorithms predict an impending hardware failure, live migration can be used to move guest VMs onto different hosts preemptively.

Amongst other topics, planned maintenance and AI Operations were covered in Igal Figlin’s recent Ignite 2019 session “Building resilient applications in Azure.” Watch the recording here for additional context on these, and to learn more about how to take advantage of the various resilient services Azure provides to help you build applications that are inherently resilient.

The future of Azure maintenance 

In summary, the way in which Azure performs maintenance varies significantly depending on the type of updates being applied. Regardless of the specifics, Azure always approaches maintenance with a view towards ensuring the smallest possible impact to customer workloads. This post has outlined several of the technologies that we use to achieve this, and we are working diligently to continue improving the customer experience. As we look toward the future, we are investing heavily in machine learning-based insights and automation to maintain availability and reliability. Eventually, this “AI Operations” model will carry out preventative maintenance, initiate automated mitigations, and identify contributing factors and dependencies during incidents more effectively than our human engineers can. We look forward to sharing more on these topics as we continue to learn and evolve.

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • How Azure Cobalt 100 VMs are powering real-world solutions, delivering performance and efficiency results
  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,322)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in