Tuesday, February 3, 2026

Stockfish 18 chess engine accelerated with CUDA on NVIDIA DGX SPARK

Stockfish chess engine accelerated with CUDA on DGX SPARK, leveraging 128GB unified memory.

https://forums.developer.nvidia.com/t/stockfish-cuda-gpu-accelerated-chess-engine-for-dgx-spark/359551 

This project accelerates Stockfish 18, the world’s strongest open-source chess engine, using NVIDIA CUDA on DGX SPARK. It leverages the Blackwell GPU architecture and 128GB unified memory to achieve 3.3x speedup over CPU baseline. Key innovations include GPU-accelerated NNUE neural network evaluation using Tensor Cores, unified memory transposition tables up to 96GB, and batch evaluation of 256 positions. The implementation uses cudaMallocManaged for seamless CPU/GPU memory sharing and WMMA for int8 matrix operations.

Technologies Used:

  • CUDA 12.0+
  • Unified Memory (cudaMallocManaged)
  • Tensor Cores (WMMA)
  • Blackwell Architecture (SM 12.x)
  • DGX SPARK Platform

Industry/Application:
Artificial Intelligence, Gaming, High Performance Computing, Game Tree Search

Performance Metrics:

  • 400M+ nodes/second with 64 threads
  • 3.3x speedup vs CPU-only
  • 96GB unified memory utilization
  • 119GB total memory available

Links:
GitHub: Release stockfish-cuda-full (109MB) ⭐ Versione completa · EquaCoin/stockfish-dgx-spark · GitHub

No comments:

Post a Comment