Windows Server Summit 2026 | Part 11: AI workloads on Windows Server

Lesedauer 3 Minuten

AI is now everywhere, no matter where you look. Several large and many small providers have established themselves in the market, each with their own strengths and weaknesses. However, most providers share one major drawback: they are cloud-based.

Windows Server, on the other hand, allows you to deploy AI in your own data center. Sound unrealistic? Not for much longer!

Windows Server as AI platform

Two new Hyper-V features form the foundation for AI usage:

  • The “GPU partitioning” feature allows the resources of a physical graphics card to be distributed across multiple virtual machines.
  • Connection to NVMe storage dramatically increases storage performance.

In addition, Windows Server 2025 features the new “Local AI Inference” capability. This makes it possible to run trained AI models on your own hardware. Together, these features form the foundation for the local deployment of Foundry.

Foundry Local on Windows Server

Microsoft Foundry is a universal platform for building and maintaining AI applications and agents. It is already available natively in Azure.

Foundry Local is the counterpart for on-premises data centers. It can run on Windows Server 2025 and future versions of Windows Server. No special hardware is required. Any language models can be downloaded and integrated.

Installation and usage are demonstrated in a short demo (AI workloads on Windows Server - Windows Server Summit, from minute 8:38).

Typical scenarios for AI workloads on Windows Server

The following table shows some use cases for on-premises AI deployment:

ScenarioIndustryDetails
Use image processing models to identify quality issues in the production lineManufacturingLow latency, limited connectivity
Use local models to perform AI inference for preventive maintenance in completely isolated systemsSubmarineAir-gapped environments, AI accelerators
Knowledge workers can process intellectual property using agent-based workflowsHealthcareHigh data boundaries, data protection, sustainability, MCP tool calling
Enable the cross-application use of generative AIFinTechHigh data protection, model updates, security requirements
Manufacture AI-certified hardware for service providers or distributors serving SMBsHardware resellerEase access to local AI

New features in Foundry Local

A new feature is the deployment of so-called text embedding models in conjunction with SQL Server 2025. This enables the fully on-premises deployment of a platform for so-called retrieval-augmented generation, which until now was only possible with the involvement of cloud providers (see Retrieval-Augmented Generation – Wikipedia).

In addition, it is now possible to use solutions for accessing local AI models based on the Model Context Protocol (MCP). These enable context-based data processing based on user input, similar to the experience with Copilot and ChatGPT.

Known limitations in Foundry Local

Foundry Local has some limitations. It is not optimized for the following scenarios or environments:

  • Multi-GPU graphics cards
  • Distributed inference
  • Failover clustering
  • Limited concurrency and batch-based inference; sequential only
  • Not intended for very large enterprise environments

Other options

Microsoft also lists a few open-source alternatives to Foundry Local:

  • vLLM
  • SGLang
  • Ollama


Liked this article? Share it!