Reducing Latency in Pixel Scanning for Input Simulation

Question 1

I am developing a Python script to pass a challenging minigame in a single-player title. My goal is to unlock a specific achievement within this game, a task I've spent over 10 hours attempting without success. This is also an opportunity for me to learn coding for the first time. So I thought, why not?

This is the script

from pyautogui import *
import pyautogui
import time
import keyboard
import win32api, win32con
def press_spacebar():
 win32api.keybd_event(0x20, 0, 0, 0)
 time.sleep(0.05)
 win32api.keybd_event(0x20, 0, win32con.KEYEVENTF_KEYUP, 0)
 time.sleep(0.01)
while keyboard.is_pressed('q')==False:
 
 if pyautogui.pixel(2400,690)[0] >= 150:
 press_spacebar()
 if pyautogui.pixel(2400,761)[0] >= 150:
 press_spacebar()

And this is the minigame I was talking about. minigame

The game requires precise timing: a fast-moving red bar oscillates vertically, and I need to stop it within a green zone by pressing spacebar (or mouse click). My approach is to use pixel scanning at the upper and lower boundaries of the green zone. When the red bar reaches these scanned points, the script triggers the spacebar press. My initial method of approaching this problem was to use pyautogui.press("space"), but it failed to nail the reaction time challenge. Every single time, its off by 200 to 250ms, tried searching the internet for a better solution. This time I came across a win32api tutorial for piano tiles, claiming it is a lot faster than most common method. I tried using it in the script as shown above, still off about 150ms. Now my problem is to reduce the latency to about under 100ms if possible.

Question 2

why do you call sleep() in your press_spacebar function? How did you pick those values?

Question 3

This answer suggests disabling Windows Desktop Composition to speed up pixel access.

Question 4

@nabulator to prevent some complication within the script, but without it or not, my problem still remains.

Question 5

@importrandom I will try that out, thanks for directing me to that thread.

Question 6

maybe you can try pressing the space a few pixels before, so that in that time it reaches the green zone

Question 7

Final edit at the top for visibility: Thanks to IInspectable for the suggestion, targeting the specific window DC is much faster, less than a second.

def get_win_px(x=0, y=0, name="Device Manager"):
 hwnd = win32gui.FindWindow(None, name) # warning no error handling
 hdc = win32gui.GetWindowDC(hwnd)
 color = win32gui.GetPixel(hdc, x, y)
 win32gui.ReleaseDC(0, hdc)
 return color

Using timeit to get an idea of how long each approach takes to execute.

Testing shows win32 is fastest on my PC, as you've also found:

get_px_ctypes: 16.759 ms
get_px_win32: 16.689 ms
get_px_pya: 16.798 ms
get_win_px: 0.970 ms
press_win32: 4.701 ms
press_pya: 104.358 ms
dpress_win32: 54.339 ms
pressed_win32: 0.031 ms
pressed_kbd: 5.729 ms

Note that press_win32 has no delay between key down and up events, dpress_win32 has a 50ms delay after the key down event. The delays after the key down event are probably not impacting the key detection in your game and might prevent duplicate key presses for the same colour bar.

Interestingly the get pixel functions are comparable.

So I would recommend an approach without the check for Q to exit, use Ctrl+C instead, keboard.pressed() adds ~6ms per loop.

It's not clear what benefit looking at multiple pixels provides; it opens the possibility of missing the first location check when checking the second if the bar moves fast enough.

Profiling indicates most of the time is spent in threads waiting for a lock or the response to be added to a queue. I've excluded calls below whose total time is less than a millisecond. This is not exciting or actionable information, other than to guide us to seek an alternative avoiding the full desktop.

>>> import cProfile
>>> cProfile.run("detect_and_press()", sort='time')
 47528 function calls (46822 primitive calls) in 657.508 seconds
 Ordered by: internal time
 ncalls tottime percall cumtime percall filename:lineno(function)
4941/4236 637.206 0.129 637.384 0.150 {method 'acquire' of '_thread.lock' objects}
 705 19.867 0.028 657.260 0.932 threading.py:323(wait)
 2/1 0.180 0.090 0.017 0.017 {built-in method builtins.exec}
 706 0.096 0.000 0.160 0.000 _winkeyboard.py:498(process_key)
 705 0.020 0.000 677.175 0.961 queue.py:154(get)
 706 0.019 0.000 0.162 0.000 _winkeyboard.py:531(low_level_keyboard_handler)
 1 0.017 0.017 0.017 0.017 test_px_access.py:33(detect_and_press)
 706 0.015 0.000 0.051 0.000 __init__.py:222(direct_callback)
 1412 0.010 0.000 0.010 0.000 {method 'release' of '_thread.lock' objects}
 706 0.008 0.000 0.008 0.000 {built-in method _thread.allocate_lock}
 706 0.007 0.000 0.010 0.000 __init__.py:211(pre_process_event)
 706 0.007 0.000 0.027 0.000 queue.py:122(put)
 1412 0.006 0.000 0.006 0.000 {built-in method builtins.sorted}
 2118 0.005 0.000 0.020 0.000 threading.py:394(notify)
 706 0.004 0.000 0.013 0.000 _keyboard_event.py:24(__init__)
 706 0.004 0.000 0.008 0.000 _canonical_names.py:1233(normalize_name)
 706 0.003 0.000 0.009 0.000 queue.py:57(task_done)
 2118 0.003 0.000 0.005 0.000 threading.py:302(__exit__)
 2824 0.003 0.000 0.003 0.000 {built-in method builtins.len}
 3530 0.002 0.000 0.002 0.000 {method '__exit__' of '_thread.lock' objects}
 2824 0.002 0.000 0.006 0.000 threading.py:314(_is_owned)
 706 0.002 0.000 0.002 0.000 {built-in method builtins.all}
 2118 0.002 0.000 0.003 0.000 threading.py:299(__enter__)
 1412 0.002 0.000 0.003 0.000 queue.py:209(_qsize)
 2118 0.002 0.000 0.002 0.000 {method '__enter__' of '_thread.lock' objects}
 706 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects}
 706 0.002 0.000 0.003 0.000 threading.py:311(_acquire_restore)
 706 0.001 0.000 0.001 0.000 {built-in method time.time}
 706 0.001 0.000 0.003 0.000 threading.py:424(notify_all)
 1190 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance}
 484 0.001 0.000 0.002 0.000 __init__.py:135(is_modifier)
 706 0.001 0.000 0.002 0.000 queue.py:217(_get)
 706 0.001 0.000 0.001 0.000 queue.py:213(_put)
 1412 0.001 0.000 0.001 0.000 {method 'append' of 'collections.deque' objects}
 372 0.001 0.000 0.001 0.000 {method 'lower' of 'str' objects}
 706 0.001 0.000 0.001 0.000 {method 'popleft' of 'collections.deque' objects}
 706 0.001 0.000 0.001 0.000 threading.py:308(_release_save)
 706 0.001 0.000 0.001 0.000 {method 'remove' of 'collections.deque' objects}

Full Code listing, edited to include release of device context handle:

import ctypes
import time
import win32api, win32con, win32gui
import pyautogui
import keyboard
def get_px_ctypes(x=0, y=0):
 hdc = ctypes.windll.user32.GetDC(0)
 color = ctypes.windll.gdi32.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
 ctypes.windll.user32.ReleaseDC(0, hdc)
 return color
def get_px_win32(x=0, y=0):
 hdc = win32gui.GetDC(0)
 color = win32gui.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
 win32gui.ReleaseDC(0, hdc)
 return color
def get_win_px(x=0, y=0, name="Device Manager"):
 hwnd = win32gui.FindWindow(None, name) # warning no error handling
 hdc = win32gui.GetWindowDC(hwnd)
 color = win32gui.GetPixel(hdc, x, y)
 win32gui.ReleaseDC(0, hdc)
 return color
def press_win32(key_code=0x20):
 win32api.keybd_event(key_code, 0, 0, 0)
 win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def dpress_win32(key_code=0x20):
 """Include a delay between key down and key up events"""
 win32api.keybd_event(key_code, 0, 0, 0)
 time.sleep(0.05)
 win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def pressed_win32(key_code=0x51): # 0x51 == q
 return win32api.GetAsyncKeyState(key_code) & 0x8000
def get_px_pya(x=0, y=0):
 return pyautogui.pixel(x, y) # [0] # (R, G, B)
def press_pya(key="space"):
 pyautogui.press(key)
def pressed_kbd(key="q"):
 return keyboard.is_pressed("q")
def detect_and_press(x=0, y=0):
 try:
 hdc = ctypes.windll.user32.GetDC(0)
 while True:
 if ctypes.windll.gdi32.GetPixel(hdc, x, y) & 0xFF > 150: # compare red value
 win32api.keybd_event(0x20, 0, 0, 0)
 win32api.keybd_event(0x20, 0, win32con.KEYEVENTF_KEYUP, 0)
 except KeyboardInterrupt:
 return
 finally:
 ctypes.windll.user32.ReleaseDC(0, hdc)
if __name__ == "__main__":
 import timeit
 loops = 100 # how many times to repeat the function call
 multiplier = 1_000 / loops # convert to ms
 for function in get_px_ctypes, get_px_win32, get_px_pya, get_win_px, press_win32, press_pya, dpress_win32, pressed_win32, pressed_kbd:
 fn = function.__name__
 print(f"{fn}:\t{timeit.timeit(f"{fn}()", setup=f"from __main__ import {fn}", number=loops)*multiplier:8,.3f} ms")

Question 8

Your program contains a serious bug and leaks resources. After running for a period of time, the screen will become unresponsive and GetDC will return NULL (0) from the 10001th time onwards.

Question 9

16.7ms is very close to 1/60th of a second, the default display refresh rate. This seems to support the assumption that a call to GetPixel incurs a full DWM composition pass. At least in theory, reading out of a window DC (as opposed to the desktop DC) could avoid the composition pass. Profiling this approach might make sense.

Question 10

It looks like you've found and fixed your bug. By the way, while leaking an HDC can also cause minor memory leaks, this is much more serious. A large number of leaked HDCs can slow down the system's drawing speed (as you've probably experienced). Generally, GetDC is very fast unless there are a large number of unreleased HDCs in the system. Therefore, in principle, HDCs should be released immediately after use. (There are many types of DCs; I'm referring specifically to Display Device Contexts here.)

Question 11

The cProfile dump doesn't produce helpful insights. It just reinstates that Windows is event-driven, and applications spend most of their time waiting for events. We've known this for decades. The timeit-table is vastly more useful. Just add a test run that calls FindWindow with the HWND returned passed into GetDC (or GetWindowDC).

Question 12

Keep in mind that the x/y coordinates are relative to the upper-left corner of the DC's backing surface. When you change your code from GetDC(0) to GetWindowDC(hwnd), you have to adjust the coordinates passed to GetPixel() appropriately. Also, if you find yourself writing two or more calls to keybd_event, it's pretty safe that you should be calling SendInput instead (see this Q&A for rationale).

Question 13

Without having the minigame oursleves, it's hard to tell if proposed solutions will eventually work. Anyway, here is something to explore.

pyautogui is slow

As you can see in the documentation, a call to pyautogui.pixel() is actually just a wrapper. It's hiding the hideous fact that even though you are specifying the pixel of interest, the code takes a screenshot of the whole screen first. According to the documentation, this takes roughly 100ms, so it matches your results.

To avoid this, you can define a region of interest (see here):

REGION = (2400, 690, 1, 85) # x, y, width, height covering both pixels
# Pixel coordinates relative to the region
PIXEL_1_REL = (0, 0) # 2400, 690
PIXEL_2_REL = (0, 71) # 2400, 761
while not keyboard.is_pressed('q'): # This is considered more elegant than `==False`
 screenshot = pyautogui.screenshot(region=REGION)
 
 # Convert to numpy array for fast pixel access
 img_array = np.array(screenshot)
 
 # Check both pixels
 pixel1_r = img_array[PIXEL_1_REL[1], PIXEL_1_REL[0], 0]
 pixel2_r = img_array[PIXEL_2_REL[1], PIXEL_2_REL[0], 0]
 
 if pixel1_r >= 150 or pixel2_r >= 150:
 press_spacebar_fast()
 
 time.sleep(0.001)

Please note the final line, which adds a little delay before restarting the loop. You should avoid looping infinitely through a loop with no pause, as it could cause 100% usage of CPU.

Additionally, if, as you say in a comment, having time.sleep() in your press_spacebar() function doesn't change anything, you should remove it.

Question 14

It’s easy to get data into the GPU, but harder to get it out

Capturing the screen image is very slow because the CPU needs to perform a full rendering process to compose the screen image.

The solution is to capture the image for the window instead of the screen.(On my computer, the speed is 7ms vs 0.2ms, using C)

Code: (using C and winapi)

HWND hwnd = FindWindowW(ClassName,WindowName);
while (true) {
 HDC hdc = GetDC(hwnd);
 COLORREF color = GetPixel(hdc, x, y);
 ReleaseDC(hwnd, hdc);
 int red = GetRValue(color);
 int green = GetGValue(color);
 int blue = GetBValue(color);
 Sleep(1);
}

Unfortunately, I don't know how to translate this into Python, but I'm sure AI can help you with this step.

If you need to retrieve pixels from multiple points simultaneously, you should use BitBlt instead of GetPixel; otherwise, you'll waste significant performance. Of course, this will be slightly more complex.

If you can't get the window image by using GetPixel/BitBlt (the pixels set by SetPixel won't be displayed on the screen), the simple solution is to disable hardware acceleration or dGPU. As for a more complex solution... I personally use WindowsGraphicsCapture in C++, but I don't know what to use in Python.

import random 3,2652 gold badges21 silver badges30 bronze badges · Accepted Answer · 2025-08-26 06:49:48Z

Final edit at the top for visibility: Thanks to IInspectable for the suggestion, targeting the specific window DC is much faster, less than a second.

def get_win_px(x=0, y=0, name="Device Manager"):
 hwnd = win32gui.FindWindow(None, name) # warning no error handling
 hdc = win32gui.GetWindowDC(hwnd)
 color = win32gui.GetPixel(hdc, x, y)
 win32gui.ReleaseDC(0, hdc)
 return color

Using timeit to get an idea of how long each approach takes to execute.

Testing shows win32 is fastest on my PC, as you've also found:

get_px_ctypes: 16.759 ms
get_px_win32: 16.689 ms
get_px_pya: 16.798 ms
get_win_px: 0.970 ms
press_win32: 4.701 ms
press_pya: 104.358 ms
dpress_win32: 54.339 ms
pressed_win32: 0.031 ms
pressed_kbd: 5.729 ms

Note that press_win32 has no delay between key down and up events, dpress_win32 has a 50ms delay after the key down event. The delays after the key down event are probably not impacting the key detection in your game and might prevent duplicate key presses for the same colour bar.

Interestingly the get pixel functions are comparable.

So I would recommend an approach without the check for Q to exit, use Ctrl+C instead, keboard.pressed() adds ~6ms per loop.

It's not clear what benefit looking at multiple pixels provides; it opens the possibility of missing the first location check when checking the second if the bar moves fast enough.

Profiling indicates most of the time is spent in threads waiting for a lock or the response to be added to a queue. I've excluded calls below whose total time is less than a millisecond. This is not exciting or actionable information, other than to guide us to seek an alternative avoiding the full desktop.

>>> import cProfile
>>> cProfile.run("detect_and_press()", sort='time')
 47528 function calls (46822 primitive calls) in 657.508 seconds
 Ordered by: internal time
 ncalls tottime percall cumtime percall filename:lineno(function)
4941/4236 637.206 0.129 637.384 0.150 {method 'acquire' of '_thread.lock' objects}
 705 19.867 0.028 657.260 0.932 threading.py:323(wait)
 2/1 0.180 0.090 0.017 0.017 {built-in method builtins.exec}
 706 0.096 0.000 0.160 0.000 _winkeyboard.py:498(process_key)
 705 0.020 0.000 677.175 0.961 queue.py:154(get)
 706 0.019 0.000 0.162 0.000 _winkeyboard.py:531(low_level_keyboard_handler)
 1 0.017 0.017 0.017 0.017 test_px_access.py:33(detect_and_press)
 706 0.015 0.000 0.051 0.000 __init__.py:222(direct_callback)
 1412 0.010 0.000 0.010 0.000 {method 'release' of '_thread.lock' objects}
 706 0.008 0.000 0.008 0.000 {built-in method _thread.allocate_lock}
 706 0.007 0.000 0.010 0.000 __init__.py:211(pre_process_event)
 706 0.007 0.000 0.027 0.000 queue.py:122(put)
 1412 0.006 0.000 0.006 0.000 {built-in method builtins.sorted}
 2118 0.005 0.000 0.020 0.000 threading.py:394(notify)
 706 0.004 0.000 0.013 0.000 _keyboard_event.py:24(__init__)
 706 0.004 0.000 0.008 0.000 _canonical_names.py:1233(normalize_name)
 706 0.003 0.000 0.009 0.000 queue.py:57(task_done)
 2118 0.003 0.000 0.005 0.000 threading.py:302(__exit__)
 2824 0.003 0.000 0.003 0.000 {built-in method builtins.len}
 3530 0.002 0.000 0.002 0.000 {method '__exit__' of '_thread.lock' objects}
 2824 0.002 0.000 0.006 0.000 threading.py:314(_is_owned)
 706 0.002 0.000 0.002 0.000 {built-in method builtins.all}
 2118 0.002 0.000 0.003 0.000 threading.py:299(__enter__)
 1412 0.002 0.000 0.003 0.000 queue.py:209(_qsize)
 2118 0.002 0.000 0.002 0.000 {method '__enter__' of '_thread.lock' objects}
 706 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects}
 706 0.002 0.000 0.003 0.000 threading.py:311(_acquire_restore)
 706 0.001 0.000 0.001 0.000 {built-in method time.time}
 706 0.001 0.000 0.003 0.000 threading.py:424(notify_all)
 1190 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance}
 484 0.001 0.000 0.002 0.000 __init__.py:135(is_modifier)
 706 0.001 0.000 0.002 0.000 queue.py:217(_get)
 706 0.001 0.000 0.001 0.000 queue.py:213(_put)
 1412 0.001 0.000 0.001 0.000 {method 'append' of 'collections.deque' objects}
 372 0.001 0.000 0.001 0.000 {method 'lower' of 'str' objects}
 706 0.001 0.000 0.001 0.000 {method 'popleft' of 'collections.deque' objects}
 706 0.001 0.000 0.001 0.000 threading.py:308(_release_save)
 706 0.001 0.000 0.001 0.000 {method 'remove' of 'collections.deque' objects}

Full Code listing, edited to include release of device context handle:

import ctypes
import time
import win32api, win32con, win32gui
import pyautogui
import keyboard
def get_px_ctypes(x=0, y=0):
 hdc = ctypes.windll.user32.GetDC(0)
 color = ctypes.windll.gdi32.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
 ctypes.windll.user32.ReleaseDC(0, hdc)
 return color
def get_px_win32(x=0, y=0):
 hdc = win32gui.GetDC(0)
 color = win32gui.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
 win32gui.ReleaseDC(0, hdc)
 return color
def get_win_px(x=0, y=0, name="Device Manager"):
 hwnd = win32gui.FindWindow(None, name) # warning no error handling
 hdc = win32gui.GetWindowDC(hwnd)
 color = win32gui.GetPixel(hdc, x, y)
 win32gui.ReleaseDC(0, hdc)
 return color
def press_win32(key_code=0x20):
 win32api.keybd_event(key_code, 0, 0, 0)
 win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def dpress_win32(key_code=0x20):
 """Include a delay between key down and key up events"""
 win32api.keybd_event(key_code, 0, 0, 0)
 time.sleep(0.05)
 win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def pressed_win32(key_code=0x51): # 0x51 == q
 return win32api.GetAsyncKeyState(key_code) & 0x8000
def get_px_pya(x=0, y=0):
 return pyautogui.pixel(x, y) # [0] # (R, G, B)
def press_pya(key="space"):
 pyautogui.press(key)
def pressed_kbd(key="q"):
 return keyboard.is_pressed("q")
def detect_and_press(x=0, y=0):
 try:
 hdc = ctypes.windll.user32.GetDC(0)
 while True:
 if ctypes.windll.gdi32.GetPixel(hdc, x, y) & 0xFF > 150: # compare red value
 win32api.keybd_event(0x20, 0, 0, 0)
 win32api.keybd_event(0x20, 0, win32con.KEYEVENTF_KEYUP, 0)
 except KeyboardInterrupt:
 return
 finally:
 ctypes.windll.user32.ReleaseDC(0, hdc)
if __name__ == "__main__":
 import timeit
 loops = 100 # how many times to repeat the function call
 multiplier = 1_000 / loops # convert to ms
 for function in get_px_ctypes, get_px_win32, get_px_pya, get_win_px, press_win32, press_pya, dpress_win32, pressed_win32, pressed_kbd:
 fn = function.__name__
 print(f"{fn}:\t{timeit.timeit(f"{fn}()", setup=f"from __main__ import {fn}", number=loops)*multiplier:8,.3f} ms")

Your program contains a serious bug and leaks resources. After running for a period of time, the screen will become unresponsive and GetDC will return NULL (0) from the 10001th time onwards.
16.7ms is very close to 1/60th of a second, the default display refresh rate. This seems to support the assumption that a call to GetPixel incurs a full DWM composition pass. At least in theory, reading out of a window DC (as opposed to the desktop DC) could avoid the composition pass. Profiling this approach might make sense.
It looks like you've found and fixed your bug. By the way, while leaking an HDC can also cause minor memory leaks, this is much more serious. A large number of leaked HDCs can slow down the system's drawing speed (as you've probably experienced). Generally, GetDC is very fast unless there are a large number of unreleased HDCs in the system. Therefore, in principle, HDCs should be released immediately after use. (There are many types of DCs; I'm referring specifically to Display Device Contexts here.)
The cProfile dump doesn't produce helpful insights. It just reinstates that Windows is event-driven, and applications spend most of their time waiting for events. We've known this for decades. The timeit-table is vastly more useful. Just add a test run that calls FindWindow with the HWND returned passed into GetDC (or GetWindowDC).
Keep in mind that the x/y coordinates are relative to the upper-left corner of the DC's backing surface. When you change your code from GetDC(0) to GetWindowDC(hwnd), you have to adjust the coordinates passed to GetPixel() appropriately. Also, if you find yourself writing two or more calls to keybd_event, it's pretty safe that you should be calling SendInput instead (see this Q&A for rationale).

CollectivesTM on Stack Overflow

Reducing Latency in Pixel Scanning for Input Simulation

3 Answers 3

6 Comments

pyautogui is slow

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

6 Comments

pyautogui is slow

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related