Since the AI revolution, the so-called
CUDA moat, and how to overcome it, has been even more on my mind.
The "CUDA moat" is significant due to the extensive legacy code spanning many use cases in HPC and AI. There must be millions of lines of CUDA code out there. Much of this code, written by researchers, companies, and individuals, will persist because sticking with Nvidia requires less effort compared to rewriting, however compelling alternatives are. This is similar to other long-standing platforms like
x86 and
Win32.
That said, while the workloads within the CUDA ecosystem are broad, the most profitable workloads within AI training and inference are relatively narrow and rely on cutting-edge open frameworks like
PyTorch,
DeepSpeed, and OpenAI's
Triton. Notably, PyTorch has supported AMD out-of-the-box since last year, and all open-source models at
Hugging Face, the central repository for these models, are continuously tested for AMD compatibility.
AMD's open
ROCm software has improved leaps and bounds over the last year. It seems perfectly feasible to me, especially for a company with AMD's resources, to adequately provide the functionality and optimizations needed to overcome the CUDA moat for AI training and inference. It is already happening.
So, while the entire CUDA ecosystem can't be replaced, the high-value, lucrative workloads can definitely be contested, as I see it.
Further reading: