Discussion - Zen 7 speculation thread | Page 20 | AnandTech Forums: Technology, Hardware, Software, and Deals

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 7:59 AM

#476

Joe NYC said:

If there were overclocking tools for Mac and you raised your voltage limit, raised the clock, you could get to very power inefficient zone too. But Apple just prevents you from doing it.

Click to expand...

Yes, but that is all irrelevant when Apple already is much faster 1T while consuming less. Who cares if they would double or even triple the power consumption if clocked, say, 7% higher? a good way to measure efficiency is both performance at the same power or power at the same performance.

AMD is at a worse point in the curve I think from an efficiency standpoint. But if you clock it like 10% lower to be at a more optimal point, the deficit is still way too high.

Reactions: Tlh97, BorisTheBlade82 and 511

Meteor Late

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 8:04 AM

#477

mikegg said:

Why do you think I don't understand this? When did I ever disagree with this statement?

All I'm asking you is when you lower the clocks to achieve the same perf/watt as an M4 Pro, what is the speed of Strix Halo?

Click to expand...

Better would be to simply lower the frequency so that it consumes a similar amount of power as M4 Pro, that would be more apples to apples, performance at the same power.

StefanR5R

Elite Member

: Dec 10, 2016

: 6,730

: 10,697

: 136

Yesterday at 8:44 AM

#478

Wow. So much talk about power consumption figures which were completely fictional.

inquiss

Senior member

: Oct 13, 2010

: 543

: 802

: 136

Yesterday at 8:46 AM

#479

Meteor Late said:

Yes, but that is all irrelevant when Apple already is much faster 1T while consuming less. Who cares if they would double or even triple the power consumption if clocked, say, 7% higher? a good way to measure efficiency is both performance at the same power or power at the same performance.

AMD is at a worse point in the curve I think from an efficiency standpoint. But if you clock it like 10% lower to be at a more optimal point, the deficit is still way too high.

Click to expand...

The point you're missing is that apple has much better idle consumption and because AMD has a lot of cores and a chunk of idle consumption it really throws the single thread efficiency calc completely out the window. It's true if you run one thread apple is much more efficient but that's more about the other factors at play than the single thread efficiency alone ignoring wherever it is on its own v/f curve and it's own performance. It looks so skewed because of the idle power.

Reactions: Tlh97, Covfefe and Josh128

Josh128

Golden Member

: Oct 14, 2022

: 1,434

: 2,178

: 106

Yesterday at 8:57 AM

#480

mikegg said:

So what is Strix Halo's ST speed if you lower the clocks to get the same perf/watt as M4 Pro?

At stock clocks, M4 Pro is already 52% faster than Strix Halo ST.

Click to expand...

You do realize M4 is on 3nm and Strix Halo is on 4nm, right? Lets circle back when they are on the same process otherwise the discussion is pretty pointless.

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 9:59 AM

#481

Meteor Late said:

AMD is at a worse point in the curve I think from an efficiency standpoint. But if you clock it like 10% lower to be at a more optimal point, the deficit is still way too high.

Click to expand...

Yes, but it is not 1 to 1.

You can lower power consumption by 50% and lose 5% of performance - as a hypothetical scenario.

Meteor Late

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 10:07 AM

#482

Joe NYC said:

Yes, but it is not 1 to 1.

You can lower power consumption by 50% and lose 5% of performance - as a hypothetical scenario.

Click to expand...

Yes, AMD would definitely improve the efficiency more than Apple by lowering 5 or 10% the clock speed, because they are at a higher point in the curve. But the gap would still be way too high.

Meteor Late

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 10:07 AM

#483

Josh128 said:

You do realize M4 is on 3nm and Strix Halo is on 4nm, right? Lets circle back when they are on the same process otherwise the discussion is pretty pointless.

Click to expand...

The gap is too big to be explained only by process node, which no doubt helps increase the gap a lot.

Markfw

Moderator Emeritus, Elite Member

: May 16, 2002

: 27,304

: 16,134

: 136

Yesterday at 10:08 AM

#484

mikegg said:

You keep repeating this but that's exactly what would happen if Intel puts enough atoms into a single chip.

What you're missing is area efficiency, which is a factor when it comes to MT scaling. Ultimately what matters for a server chip is area x efficiency x performance of one core x # of cores. Area directly correlates to cost of the chip.

So you put enough atoms into a single chip and it can win nT race by a mile. It's likely terribly inefficient area wise for the performance.

Click to expand...

NO, as the added Mod comment says, no trolling or baiting. We have heard enough about Intel.

Second, small cores without SMT and avx-512 are very lacking. Just SMT alone doubled (or greatly increases) the number or cores. Once you add ability for other things like avx-512 they are lost in the dust.

Reactions: Tlh97 and marees

Meteor Late

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 10:11 AM

#485

inquiss said:

The point you're missing is that apple has much better idle consumption and because AMD has a lot of cores and a chunk of idle consumption it really throws the single thread efficiency calc completely out the window. It's true if you run one thread apple is much more efficient but that's more about the other factors at play than the single thread efficiency alone ignoring wherever it is on its own v/f curve and it's own performance. It looks so skewed because of the idle power.

Click to expand...

AMD has other parts with less cores if you want to compare, like for example lower variants of Strix Point, Krackan Point...

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 10:14 AM

#486

mikegg said:

Why do you think I don't understand this? When did I ever disagree with this statement?

All I'm asking you is when you lower the clocks to achieve the same perf/watt as an M4 Pro, what is the speed of Strix Halo?

Click to expand...

You don't achieve the same perf / watt. But what you will achieve is similar ratio of perf / watt in ST as you get in MT. Meaning, not 350% - 400% difference but more like 35% to 40% difference.

In other words, the wet dream (below) of 360% gap in efficiency growing to 720% gap in efficiency is just not living in reality:

mikegg said:

Can you estimate how much more performance and efficiency gains AMD needs to overtake Apple's M8?

Here's a baseline for you via Notebookcheck. M4 Pro is roughly 52% faster in Cinebench ST and 3.6x more efficient than Strix Halo.

Let's suppose M8 doubles M4 per/watt to 19pts/w. Let's suppose ST is increased by 46% over 4 generations to 260 points.

Will Zen7 increase Strix Halo efficiency by 7.2x while also increasing ST performance by 2.2x?

Benchmark Strix Halo 395+ M4 Pro Mini M4 Max % Difference (M4 Max vs Strix Halo)
Memory Bandwidth 256GB/s 273GB/s 546GB/s +113.3%
Cinebench 2024 ST 116.8 178 178 +52.4%
Cinebench 2024 MT 1648 1729 2069 +25.6%
Geekbench ST 2978 3836 3880 +30.3%
Geekbench MT 21269 22509 25760 +21.1%
3DMark Wildlife (GPU) 19615 19345 37434 +90.8%
GFX Bench (fps) (GPU) 114 125.8 232 +103.5%
Blender GPU Party Tug (GPU) 55 sec 43 sec — —
Cinebench ST Power Efficiency 2.62 pts/W 9.52 pts/W — —
Cinebench MT Power Efficiency 14.7 pts/W 20.2 pts/W — —

Click to expand...

Benchmark	Strix Halo 395+	M4 Pro Mini	M4 Max	% Difference (M4 Max vs Strix Halo)
Memory Bandwidth	256GB/s	273GB/s	546GB/s	+113.3%
Cinebench 2024 ST	116.8	178	178	+52.4%
Cinebench 2024 MT	1648	1729	2069	+25.6%
Geekbench ST	2978	3836	3880	+30.3%
Geekbench MT	21269	22509	25760	+21.1%
3DMark Wildlife (GPU)	19615	19345	37434	+90.8%
GFX Bench (fps) (GPU)	114	125.8	232	+103.5%
Blender GPU Party Tug (GPU)	55 sec	43 sec	—	—
Cinebench ST Power Efficiency	2.62 pts/W	9.52 pts/W	—	—
Cinebench MT Power Efficiency	14.7 pts/W	20.2 pts/W	—	—

Reactions: Tlh97 and Covfefe

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 10:23 AM

#487

Meteor Late said:

Yes, AMD would definitely improve the efficiency more than Apple by lowering 5 or 10% the clock speed, because they are at a higher point in the curve. But the gap would still be way too high.

Click to expand...

I posted repeatedly that Mac M line has advantage in performance and efficiency, which I did not once dispute.

All I am disputing is the 360% efficiency advantage going on to 720% efficiency advantage.

Calling BS on it.

Meteor Late

Senior member

: Dec 15, 2023

: 340

: 374

: 96

Yesterday at 10:43 AM

#488

Joe NYC said:

I posted repeatedly that Mac M line has advantage in performance and efficiency, which I did not once dispute.

All I am disputing is the 360% efficiency advantage going on to 720% efficiency advantage.

Calling BS on it.

Click to expand...

Yeah usually what you want to compare is two metrics between two processors:
-Performance at the same power: Possible to do, just downclock Zen 5 Laptop parts to match M4 or M5 power consumption 1T, then assess the difference in performance.
-Power at the same performance: Not possible I think because you cannot manually downclock or power limit manually an Apple CPU I think.

Difference in the first scenario will always be much lower than in the second scenario, because power scales quadratically with performance when talking about clock speed in general, of course it depends on the process node, on which part of the curve we are on, etc. So for example, 50% more performance at the same power is more impressive than 50% less power at the same performance.

Reactions: Joe NYC

Kepler_L2

Golden Member

: Sep 6, 2020

: 1,015

: 4,339

: 136

Yesterday at 11:38 AM

#489

Joe NYC said:

FP512 - I wonder what that might be. AVX-512 equivalent synchronized with Intel

Click to expand...

That's what they already have with Zen5, it just means 512-bit execution units (unlike Zen4 with AVX-512 on FP256)

Joe NYC said:

1/2 ACE - ACE seem to come from Advanced Matrix Extensions from AMD / Intel collaboration, and presumably, since this referred to 1/2 CCD (8 cores) it could be 1/2 of AMD's planned ACE unit.

Click to expand...

Again it just means 512-bit execution instead of 1024-bit (double pumped)

Joe NYC said:

it mentions 4x FP8 performance and 2x Int8 performance. I am assuming that these will be new datatypes for AVX-512. Zen 6 is already adding 2x FP16 performance, so Zen 7 seems to be extending it further to 8xFP8. I presume this will also be part of the AVX-512 definition.

Click to expand...

FP8 support (2x) plus 2x FMA/iFMA execution ports

Also IMO this is why they are going very aggressively to A14, such a massive FPU plus AMX/ACE support would just be too big and power hungry even on N2P.

Reactions: marees, Tlh97, Joe NYC and 1 other person

511

Diamond Member

: Jul 12, 2024

: 4,822

: 4,389

: 106

Yesterday at 11:41 AM

#490

Markfw said:

Second, small cores without SMT and avx-512 are very lacking. Just SMT alone doubled (or greatly increases) the number or cores. Once you add ability for other things like avx-512 they are lost in the dust.

Click to expand...

funny you say that E cores had 4 way SMT and AMX-512 at one point but the program bit the dust thanks to amazing Intel decisions

511

Diamond Member

: Jul 12, 2024

: 4,822

: 4,389

: 106

Yesterday at 11:42 AM

#491

Kepler_L2 said:

Also IMO this is why they are going very aggressively to A14, such a massive FPU plus AMX/ACE support would just be too big and power hungry even on N2P.

Click to expand...

AMX is a die hog like too much

Reactions: Tlh97

Thunder 57

Diamond Member

: Aug 19, 2007

: 4,117

: 6,867

: 136

Yesterday at 11:59 AM

#492

511 said:

funny you say that E cores had 4 way SMT and AMX-512 at one point but the program bit the dust thanks to amazing Intel decisions

Click to expand...

The market said no thanks.

511

Diamond Member

: Jul 12, 2024

: 4,822

: 4,389

: 106

Yesterday at 12:14 PM

#493

Thunder 57 said:

The market said no thanks.

Click to expand...

Market didn't understand it just like optane and it got killed

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 12:40 PM

#494

Kepler_L2 said:

That's what they already have with Zen5, it just means 512-bit execution units (unlike Zen4 with AVX-512 on FP256)

Again it just means 512-bit execution instead of 1024-bit (double pumped)

FP8 support (2x) plus 2x FMA/iFMA execution ports

Also IMO this is why they are going very aggressively to A14, such a massive FPU plus AMX/ACE support would just be too big and power hungry even on N2P.

Click to expand...

Thanks.

BTW, this Tweet seems to indicate that on Zen 6 side, the mobile cores will have 256b vectors. Does that mean AMD is going back to Zen 4 type implementation in mobile?

If so, IMO, it is a smart decision.

https://twitter.com/x/status/1986795247335276695

Last edited: Yesterday at 12:50 PM

gdansk

Diamond Member

: Feb 8, 2011

: 4,663

: 7,875

: 136

Yesterday at 12:52 PM

#495

Joe NYC said:

Does that mean AMD is going back to Zen 4 type implementation in mobile?

Click to expand...

What do you mean back? Mainstream mobile never exceeded 256-bit SIMD width

Reactions: booklib28, Josh128, MrMPFR and 5 others

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 1:02 PM

#496

511 said:

AMX is a die hog like too much

Click to expand...

But if it moves the processing from per core to per CCD/CCX/Die, then it should be worth it.

It would be great if it could somehow supersede the far bigger waste of dies space, AKA NPU.

511

Diamond Member

: Jul 12, 2024

: 4,822

: 4,389

: 106

Yesterday at 1:03 PM

#497

Joe NYC said:

But if it moves the processing from per core to per CCD/CCX/Die, then it should be worth it.

It would be great if it could somehow supersede the far bigger waste of dies space, AKA NPU.

Click to expand...

definitely better than NPU yes

Joe NYC

Diamond Member

: Jun 26, 2021

: 3,815

: 5,363

: 136

Yesterday at 1:04 PM

#498

gdansk said:

What do you mean back? Mainstream mobile never exceeded 256-bit SIMD width

Click to expand...

Do you mean Strix Point and Kracken have 256 bit width? I did not know that...

gdansk

Diamond Member

: Feb 8, 2011

: 4,663

: 7,875

: 136

Yesterday at 1:04 PM

#499

Joe NYC said:

Do you mean Strix Point and Kracken have 256 bit width? I did not know that...

Click to expand...

Yes

Reactions: yuri69 and Joe NYC

yuri69

Senior member

: Jul 16, 2013

: 684

: 1,224

: 136

Yesterday at 1:28 PM

#500

Joe NYC said:

If die size if going up by 40% (from 70 mm2 to 98 mm2), the core count is going up by 33% and there is some density increase from new node, there should be some extra transistors and die size per core with Zen 7

Click to expand...

Oh boy, not the "IPC power of extra transistors" again.

Compared to Zen 6 this MLID's Zen 7 has 33% more cores, twice the L2 per core, and features support for, likely, many nearly fixed-use vector/FP ISA extensions... Throw some structural increases in and add that "invisible" stuff like security, RAS, profiling, or QoS things will surely eat more transistors in the 2029 timeframe...

Reactions: MrMPFR

DiscussionZen 7 speculation thread

Senior member

Senior member

Elite Member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Moderator Emeritus, Elite Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member