Stockfish chess engine accelerated with CUDA on DGX SPARK, leveraging 128GB unified memory.
This project accelerates Stockfish 18, the world’s strongest open-source chess engine, using NVIDIA CUDA on DGX SPARK. It leverages the Blackwell GPU architecture and 128GB unified memory to achieve 3.3x speedup over CPU baseline. Key innovations include GPU-accelerated NNUE neural network evaluation using Tensor Cores, unified memory transposition tables up to 96GB, and batch evaluation of 256 positions. The implementation uses cudaMallocManaged for seamless CPU/GPU memory sharing and WMMA for int8 matrix operations.
Technologies Used:
- CUDA 12.0+
- Unified Memory (cudaMallocManaged)
- Tensor Cores (WMMA)
- Blackwell Architecture (SM 12.x)
- DGX SPARK Platform
Industry/Application:
Artificial Intelligence, Gaming, High Performance Computing, Game Tree Search
Performance Metrics:
- 400M+ nodes/second with 64 threads
- 3.3x speedup vs CPU-only
- 96GB unified memory utilization
- 119GB total memory available
Links:
GitHub: Release stockfish-cuda-full (109MB) ⭐ Versione completa · EquaCoin/stockfish-dgx-spark · GitHub
No comments:
Post a Comment