We use some essential cookies to make our website work.

We use optional cookies, as detailed in our cookie policy, to remember your settings and understand how you use our website.

7 posts • Page 1 of 1
simmania
Posts: 59
Joined: Fri Sep 16, 2022 8:41 am

What do we know about the VideoCoreVII GPU?

Thu Dec 19, 2024 12:29 pm

I would like to get the maximum out of the GPU of the Raspberry Pi 5. So I would like to know as much as possible about the used GPU: the VideoCoreVII.

So what do we currently know about this GPU? I found a Github project with some architectural information: https://github.com/wimrijnders/V3DLib/b ... /Basics.md. But this is for older GPUs. What in this document is still valid? I did read somewhere that the VideoCoreVII has the same architecture as the VideoCoreVI used in Raspberry Pi 4, but with more QPUs.

What I could find out so far (please correct me if I'm wrong):

The VideoCoreVII has 3 Slices with each 4 QPUs. So a total of 12 QPUs. Each QPU has a register file with 512 bits in each register, each register holding 16 32bit values. The QPU does its operations on all this 16 values at the same time (software view). But it does this using 4 hardware ALUs (in a time multiplexed manner). And each ALU has a multiplier (with accumulator?) and an adder. So an ALU can do 2 operations at the same time.
Each Slice also has a TMU for memory read/write.

Questions I have:
Is the above described architeture for the VideoCoreVII correct?
How big is the register file in each QPU? Is this 64 registers with 512 bits for each register (each register holding 16 32bit values)?
Are there also some Accumulator registers (with also 512 bits for each register)?
How many threads can run on a QPU?
Is it correct that when n threads run on a QPU that the number of available registers is divided by n? And what about the Accumulator registers?
Do all QPUs need to run the same threads? Or can we run 12 x n (n the max number of threads on a QPU) threads at the same time?
Is it known which operations a TMU can perform? Can it do some value swapping for instance? Or read/write each of the 16 values in a register from/to some other memory location (slow but handy).
Is there also some GPU cache or other memory support?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: What do we know about the VideoCoreVII GPU?

Thu Dec 19, 2024 1:59 pm

Start with the VideoCore IV 3D spec.
There have been incremental updates in newer versions, but the changes will be minor.

simmania
Posts: 59
Joined: Fri Sep 16, 2022 8:41 am

Re: What do we know about the VideoCoreVII GPU?

Thu Dec 19, 2024 2:55 pm

dom wrote:
Thu Dec 19, 2024 1:59 pm
Start with the VideoCore IV 3D spec.
There have been incremental updates in newer versions, but the changes will be minor.
Thanks. It will give me some starting point. But this document is more then 10 years old! I guess developments in 10 years can be huge.
So how do I know what has been changed since then?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: What do we know about the VideoCoreVII GPU?

Thu Dec 19, 2024 3:11 pm

simmania wrote:
Thu Dec 19, 2024 2:55 pm
Thanks. It will give me some starting point. But this document is more then 10 years old! I guess developments in 10 years can be huge.
So how do I know what has been changed since then?
The mesa library is probably the best source.
It supports all the versions of VideoCore, and you can see which features are only enabled for never versions.

Be aware, that the 3d hardware is still rather dated (even with the bumps for Pi4 and Pi5), and for general purpose compute,
you'll probably find the ARM on a Pi5 (with neon) has more performance.

simmania
Posts: 59
Joined: Fri Sep 16, 2022 8:41 am

Re: What do we know about the VideoCoreVII GPU?

Thu Dec 19, 2024 5:55 pm

dom wrote:
Thu Dec 19, 2024 3:11 pm
Be aware, that the 3d hardware is still rather dated (even with the bumps for Pi4 and Pi5), and for general purpose compute,
you'll probably find the ARM on a Pi5 (with neon) has more performance.
Can you elaborate on that?
If NEON can do 4 multiplies and accumulations per clock, it would be 2.4GHz*8 = 19.2 GFlops. With 4 cores this would be 76.8 GFlops.
To my knownledge the VideoCoreVII has 12 QPUs, each with 4 ALUs that can do two operations. This at 800Mhz gives 0.8 * 12 * 4 * 2 = 76.8 GFlops!

Is it that the practical Flops are for the CPUs more close to the theoretical Flops then for the GPU?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 8472
Joined: Wed Aug 17, 2011 7:41 pm

Re: What do we know about the VideoCoreVII GPU?

Fri Dec 20, 2024 11:14 am

simmania wrote:
Thu Dec 19, 2024 5:55 pm
If NEON can do 4 multiplies and accumulations per clock, it would be 2.4GHz*8 = 19.2 GFlops. With 4 cores this would be 76.8 GFlops.
To my knownledge the VideoCoreVII has 12 QPUs, each with 4 ALUs that can do two operations. This at 800Mhz gives 0.8 * 12 * 4 * 2 = 76.8 GFlops!

Is it that the practical Flops are for the CPUs more close to the theoretical Flops then for the GPU?
It will be easier to use the arm neon compute as it's closely coupled with the arm code, and has convenient access to memory.
It's more awkward to access memory from the GPU (dma style operations, rather than random access).

But it will depend on the algorithm which works best. Using both would obviously be best (assuming arm and gpu are not otherwise needed).

Gavinmc42
Posts: 8346
Joined: Wed Aug 28, 2013 3:31 am

Re: What do we know about the VideoCoreVII GPU?

Mon Jun 16, 2025 7:54 am

Idein updated their Python interface to GPU/QPU for the Videocore7
https://github.com/Idein/py-videocore7

Handy for matrix math?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

7 posts • Page 1 of 1

Return to "Graphics programming"

AltStyle によって変換されたページ (->オリジナル) /