Although the initial launch of the new Microsoft Copilot+ AI PCs category is done with fellow partner Qualcomm to leverage on the new Snapdragon X series CPUs, eventually any device fulfilling the necessary requirement will be a Copilot+ AI PC and that also includes an NVIDIA RTX GPU-equipped machine.
We’ll cut to the chase and head straight to the developer side of things because for gaming there ain’t much difference to talk about. So to start, it began with Microsoft’s recent ORT Gen-AI Extension which is a library for AI inference that is cross-platform. With this, optimization techniques like quantization for LLMs including Phi-3, Llama 3, Gemma and Mistral are now supported across different execution providers for both hardware and software stacks such as DirectML.
Since DirectML is also made by Microsoft, this means that Windows-based AI developers may rely less on Linux-tailored environment and libraries and just do everything within the Windows PC ecosystem. From NVIDIA’s perspective, they are making sure optimizations delivered through R555 drivers like GeForce Game Ready/Studio/RTX Enterprise Drivers, results in performance gain of up to 3x compared to previous driver versions. Take a look below.
There are also other benefits to getting the new R555 driver which include the following:
- Support for DQ-GEMM metacommand to handle INT4 weight-only quantization for LLMs
- New RMSNorm normalization methods for Llama 2, Llama 3, Mistral, and Phi-3 models
- Group and multi-query attention mechanisms, and sliding window attention to support Mistral
- In-place KV updates to improve attention performance
- Support for GEMM of non-multiple-of-8 tensors to improve context phase performance
In the case of AI workflows running inside browsers like WebNN, it has been improved across the board too. And there are now accessible via Developer Preview builds for testing and familiarization.