In sec 2.1.6 where he talks about Processor Level Parallelism, he mentions an example of Nvidia Fermi GPU.
Modern graphics processing units (GPUs) heavily rely on SIMD processing to provide massive computational power with few transistors. Graphics processing lends itself to SIMD processors because most of the algorithms are highly regular, with repeated operations on pixels, vertices, textures, and edges. Fig. 2-7 shows the SIMD processor at the core of the Nvidia Fermi GPU. A Fermi GPU contains up to 16 SIMD stream multiprocessors (SM), with each SM containing 32 SIMD processors. Each cycle, the scheduler selects two threads to execute on the SIMD processor. The next instruction from each thread then executes on up to 16 SIMD processors, although possibly fewer if there is not enough data parallelism. If each thread is able to perform 16 operations per cycle, a fully loaded Fermi GPU core with 32 SMs will perform a whopping 512 operations per cycle. This is an impressive feat considering that a similar-sized general purpose quad-core CPU would struggle to achieve 1/32 as much processing.
I had to ask ChatGPT about this part and it said the book likely meant 2 warps instead of 2 threads, and a single warp contains 32 threads. The wikipedia also says that it operates on warps. Is the book wrong? Or are they talking about the same thing here
-
2$\begingroup$ If ChatGPT says one thing, and your book says another thing, stop using ChatGPT. $\endgroup$gnasher729– gnasher7292025年05月30日 09:54:18 +00:00Commented May 30 at 9:54
-
$\begingroup$ in this case, both are correct, just using different terminology $\endgroup$Bulat– Bulat2025年05月31日 04:56:11 +00:00Commented May 31 at 4:56