Windows Server Summit 2026 | Part 19: Failover clustering: heart of the private cloud and datacenter

21.05.2026 21.05.2026

Krähe mit Hut

Lesedauer 4 Minuten

Series

Overview: Windows Server Summit 2026 - in a nutshell!
Previous article: Windows Server Summit 2026 | Part 18: Let's talk storage: NVMe, ReFS, and what's coming next

As a rule, servers are not deployed individually but in clusters to ensure high availability and to be prepared for the failure of individual systems. In addition, such clusters allow the load to be efficiently distributed across multiple systems.

The core of such clusters is typically the “failover clustering” server feature. This allows multiple servers with similar hardware to be combined into a logical cluster with relatively little effort. This cluster can then provide redundancy for roles and products such as virtualization (Hyper-V), file services, and SQL Server.

This article discusses features that failover clustering already supports today and provides an outlook on features that will be available in the future.

Live Migration

Die folgenden Kapitel gehen auf einige Funktionen in Bezug auf die Funktion "Live-Migration" ein. Diese ermöglicht das Verschieben von virtuellen Maschinen im laufenden Betrieb, ohne dass die VM dazu heruntergefahren werden muss.

Workgroup clusters

Seit Windows Server 2016 ist es möglich, Failovercluster auch außerhalb eines Active Directory zu betreiben. Neben Kostenersparnisgründen mag dies auch aus Sicherheitserwägungen heraus genutzt werden, um beispielsweise eine separate Authentifizierungsplattform für die Virtualisierungsebene zu verwenden.

Allerdings ist es bislang nicht möglich, virtuelle Maschinen in einem solchen Cluster zwischen den Knoten im laufenden Betrieb zu verschieben. Dies ändert sich jedoch jetzt. Damit fällt eine erhebliche Einschränkung für die Nutzung solcher Cluster weg und der Einsatz wird deutlich vereinfacht.

GPU partitioning

Another new feature concerns virtual machines that have access to a physical graphics card. The “GPU partitioning” feature makes it possible to allocate the graphics card's resources across multiple virtual machines. These virtual machines can now also be moved between nodes while they are running and do not need to be shut down first.

AccelNet for highly available virtual machines

AccelNet enables a virtual machine to access the node's physical network adapter directly. This reduces latency and CPU load and, among other things, allows for the operation of additional VMs on the node compared to a configuration without AccelNet.

This feature is already in use in Azure for virtual machines and is now also being made available in Windows Server. A prerequisite for this is the use of Network ATC (Advanced Traffic Control).

New failover clustering scenarios

Windows Server 2025 supports new scenarios for setting up and operating failover clusters. These are described below.

S2D and SAN coexistence

S2D now supports the integration of existing SAN systems, allowing the two technologies to be combined. This means it is no longer necessary to purchase new server hardware to use S2D.

S2D Campus Cluster

With S2D Campus Cluster, storage devices from servers in two data centers can be combined into a single virtual pool. Previously, this was only possible with servers located at the same site.

Rack-local optimized reads

Until now, all copies of a data block have been treated as equivalent during a read operation. As a result, a read operation may access any copy other than the nearest one. This would be disadvantageous in an S2D Campus Cluster, as read operations could inadvertently be routed to the other side, potentially slowing them down.

To mitigate this behavior, read operation prioritization is now being introduced. The system evaluates where the nearest copy is located and routes the read request there. This is done according to the following hierarchy:

Same node
Same chassis
Same rack
Same site

Cloud witness with managed identity

A failover cluster is typically assigned a small storage area that contains control information for the cluster. This enables a smooth restart in the event of a complete failure or if the cluster has an uneven number of nodes.

Azure Blob Storage can also be used for this purpose. However, access to this storage has so far only been possible with an access key (similar to a password), which is stored in the failover cluster database.

To enhance security here, it is now possible to use a managed identity for access.

Stretched S2D cluster with storage replica

Windows Server has supported distributed clusters since Windows Server 2008 R2. Until now, however, storage systems had to be replicated using separate mechanisms within the storage systems themselves.

Soon, it will be possible to replicate S2D-based storage between sites using the “Storage Replication” feature built into Windows Server.

Future improvements

Microsoft also provides a preview of features that are planned for the future. These will be discussed in more detail below.

Cluster native update

For failover clusters, there is a special role called “Cluster-Aware Update” (CAU). This role automates the installation of updates in a failover cluster by preventing multiple nodes from being updated and/or restarted simultaneously. It uses various extensions (e.g., for hotfixes, updates, and custom requirements).

This functionality is now natively integrated into the cluster role and, as part of this, has been renamed “Cluster-Native Update” (CNU). Several improvements are also being made to the feature:

Improved control of nodes during an update with built-in remote management capabilities
Plugin-based architecture for better extensibility and easy development of custom plugins (including for CAU!)
Improved configuration interface and diagnostic functions
Creation and use of templates for update cycles

Admission Control

The term is somewhat misleading, but VMware also uses it to describe the underlying functionality. The purpose of access control is to ensure sufficient cluster capacity in the event of a node failure or maintenance. To this end, minimum reserves for CPU, RAM, and GPU can be configured. Cluster utilization can also be monitored.

Access control can be operated in two modes:

Soft enforcement (a warning is sent, but the current action is not interrupted)
Hard enforcement (the current action is aborted as soon as the thresholds are exceeded)

Access control applies, for example, in the following cases:

Adding/removing a virtual machine
Powering on/off a virtual machine
Resizing a virtual machine
Adding/removing a cluster node

Liked this article? Share it!

Categories: Hybrid infrastructure

Tags: Cluster Failover Clustering Private Cloud Server Roles Windows Server