1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

How to waste a few CPU cycles without causing any memory accesses

Asked 10 months ago

Viewed 135 times

I'm looking for a low-overhead method for my program to stall a few cycles on an Intel CPU, without causing memory accesses or side effects that could alter the CPU components' data (e.g. no usleep()).

What would be the best-fit instruction that has a consistent execution cycle-time and predictable behavior, so that I could use it once or numerous times, depending on how many cycles I'd like my program to stall (e.g. 5, 10, or 1000)? I can't trust nop as I've read it does not guarantee 1 cycle execution time and could be optimized away (0 cycles) throughout the pipeline's execution.

Improve this question

asked Mar 2, 2025 at 2:05

Mani's user avatar

Mani

1109 bronze badges

2

_mm_pause will stall (the front-end?) for 5 or 100 cycles depending on CPU model (before vs. after Skylake on Intel), or for a BIOS-configurable amount on Zen. _mm_lfence() will block the front-end until the back-end drains. Spinning on rdtsc can be viable if you want to wait for more than like 40 core clock cycles (How to calculate time for an asm delay loop on x86 linux?)

Peter Cordes
– Peter Cordes

2025年03月02日 02:37:45 +00:00
Commented Mar 2, 2025 at 2:37
Thank you! I unfortunately can't rely on memory fence instructions since their execution time may vary based on what's on the processor's pipeline. Follow up Q: I am using a first gen Intel Xeon Scalable Skylake-SP processor, so would that mean pause would always take ≈100 cycles?

Mani
– Mani

2025年03月02日 03:42:05 +00:00
Commented Mar 2, 2025 at 3:42
1

Yes, on Skylake it will always pause the front-end for 100 cycles while the back-end keeps running, if I understand it correctly. If there's already a cache-miss load in the back-end that will soon stall, then pause probably doesn't make thing any slower (except maybe by delaying independent work that will also stall and could have been running in parallel, e.g. another load from a separate address). Or not if it's not in the shadow of a stall that would happen anyway. I can't think of a mechanism that would make it slower by more than 100 core cycles.

Peter Cordes
– Peter Cordes

2025年03月02日 04:23:45 +00:00
Commented Mar 2, 2025 at 4:23
Thank you! Now how is pause different than rep; nop? I recently came across this combination of instructions in a code base.

Mani
– Mani

2025年03月05日 15:29:22 +00:00
Commented Mar 5, 2025 at 15:29
What does "rep; nop;" mean in x86 assembly? Is it the same as the "pause" instruction?

Peter Cordes
– Peter Cordes

2025年03月05日 21:12:23 +00:00
Commented Mar 5, 2025 at 21:12

| Show 1 more comment

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

CollectivesTM on Stack Overflow

How to waste a few CPU cycles without causing any memory accesses

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions