I am developing a Python script to pass a challenging minigame in a single-player title. My goal is to unlock a specific achievement within this game, a task I've spent over 10 hours attempting without success. This is also an opportunity for me to learn coding for the first time. So I thought, why not?
This is the script
from pyautogui import *
import pyautogui
import time
import keyboard
import win32api, win32con
def press_spacebar():
win32api.keybd_event(0x20, 0, 0, 0)
time.sleep(0.05)
win32api.keybd_event(0x20, 0, win32con.KEYEVENTF_KEYUP, 0)
time.sleep(0.01)
while keyboard.is_pressed('q')==False:
if pyautogui.pixel(2400,690)[0] >= 150:
press_spacebar()
if pyautogui.pixel(2400,761)[0] >= 150:
press_spacebar()
And this is the minigame I was talking about. minigame
The game requires precise timing: a fast-moving red bar oscillates vertically, and I need to stop it within a green zone by pressing spacebar (or mouse click). My approach is to use pixel scanning at the upper and lower boundaries of the green zone. When the red bar reaches these scanned points, the script triggers the spacebar press. My initial method of approaching this problem was to use pyautogui.press("space"), but it failed to nail the reaction time challenge. Every single time, its off by 200 to 250ms, tried searching the internet for a better solution. This time I came across a win32api tutorial for piano tiles, claiming it is a lot faster than most common method. I tried using it in the script as shown above, still off about 150ms. Now my problem is to reduce the latency to about under 100ms if possible.
3 Answers 3
Final edit at the top for visibility: Thanks to IInspectable for the suggestion, targeting the specific window DC is much faster, less than a second.
def get_win_px(x=0, y=0, name="Device Manager"):
hwnd = win32gui.FindWindow(None, name) # warning no error handling
hdc = win32gui.GetWindowDC(hwnd)
color = win32gui.GetPixel(hdc, x, y)
win32gui.ReleaseDC(0, hdc)
return color
Using timeit to get an idea of how long each approach takes to execute.
Testing shows win32 is fastest on my PC, as you've also found:
get_px_ctypes: 16.759 ms
get_px_win32: 16.689 ms
get_px_pya: 16.798 ms
get_win_px: 0.970 ms
press_win32: 4.701 ms
press_pya: 104.358 ms
dpress_win32: 54.339 ms
pressed_win32: 0.031 ms
pressed_kbd: 5.729 ms
Note that press_win32 has no delay between key down and up events, dpress_win32 has a 50ms delay after the key down event. The delays after the key down event are probably not impacting the key detection in your game and might prevent duplicate key presses for the same colour bar.
Interestingly the get pixel functions are comparable.
So I would recommend an approach without the check for Q to exit, use Ctrl+C instead, keboard.pressed() adds ~6ms per loop.
It's not clear what benefit looking at multiple pixels provides; it opens the possibility of missing the first location check when checking the second if the bar moves fast enough.
Profiling indicates most of the time is spent in threads waiting for a lock or the response to be added to a queue. I've excluded calls below whose total time is less than a millisecond. This is not exciting or actionable information, other than to guide us to seek an alternative avoiding the full desktop.
>>> import cProfile
>>> cProfile.run("detect_and_press()", sort='time')
47528 function calls (46822 primitive calls) in 657.508 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
4941/4236 637.206 0.129 637.384 0.150 {method 'acquire' of '_thread.lock' objects}
705 19.867 0.028 657.260 0.932 threading.py:323(wait)
2/1 0.180 0.090 0.017 0.017 {built-in method builtins.exec}
706 0.096 0.000 0.160 0.000 _winkeyboard.py:498(process_key)
705 0.020 0.000 677.175 0.961 queue.py:154(get)
706 0.019 0.000 0.162 0.000 _winkeyboard.py:531(low_level_keyboard_handler)
1 0.017 0.017 0.017 0.017 test_px_access.py:33(detect_and_press)
706 0.015 0.000 0.051 0.000 __init__.py:222(direct_callback)
1412 0.010 0.000 0.010 0.000 {method 'release' of '_thread.lock' objects}
706 0.008 0.000 0.008 0.000 {built-in method _thread.allocate_lock}
706 0.007 0.000 0.010 0.000 __init__.py:211(pre_process_event)
706 0.007 0.000 0.027 0.000 queue.py:122(put)
1412 0.006 0.000 0.006 0.000 {built-in method builtins.sorted}
2118 0.005 0.000 0.020 0.000 threading.py:394(notify)
706 0.004 0.000 0.013 0.000 _keyboard_event.py:24(__init__)
706 0.004 0.000 0.008 0.000 _canonical_names.py:1233(normalize_name)
706 0.003 0.000 0.009 0.000 queue.py:57(task_done)
2118 0.003 0.000 0.005 0.000 threading.py:302(__exit__)
2824 0.003 0.000 0.003 0.000 {built-in method builtins.len}
3530 0.002 0.000 0.002 0.000 {method '__exit__' of '_thread.lock' objects}
2824 0.002 0.000 0.006 0.000 threading.py:314(_is_owned)
706 0.002 0.000 0.002 0.000 {built-in method builtins.all}
2118 0.002 0.000 0.003 0.000 threading.py:299(__enter__)
1412 0.002 0.000 0.003 0.000 queue.py:209(_qsize)
2118 0.002 0.000 0.002 0.000 {method '__enter__' of '_thread.lock' objects}
706 0.002 0.000 0.002 0.000 {method 'get' of 'dict' objects}
706 0.002 0.000 0.003 0.000 threading.py:311(_acquire_restore)
706 0.001 0.000 0.001 0.000 {built-in method time.time}
706 0.001 0.000 0.003 0.000 threading.py:424(notify_all)
1190 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance}
484 0.001 0.000 0.002 0.000 __init__.py:135(is_modifier)
706 0.001 0.000 0.002 0.000 queue.py:217(_get)
706 0.001 0.000 0.001 0.000 queue.py:213(_put)
1412 0.001 0.000 0.001 0.000 {method 'append' of 'collections.deque' objects}
372 0.001 0.000 0.001 0.000 {method 'lower' of 'str' objects}
706 0.001 0.000 0.001 0.000 {method 'popleft' of 'collections.deque' objects}
706 0.001 0.000 0.001 0.000 threading.py:308(_release_save)
706 0.001 0.000 0.001 0.000 {method 'remove' of 'collections.deque' objects}
Full Code listing, edited to include release of device context handle:
import ctypes
import time
import win32api, win32con, win32gui
import pyautogui
import keyboard
def get_px_ctypes(x=0, y=0):
hdc = ctypes.windll.user32.GetDC(0)
color = ctypes.windll.gdi32.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
ctypes.windll.user32.ReleaseDC(0, hdc)
return color
def get_px_win32(x=0, y=0):
hdc = win32gui.GetDC(0)
color = win32gui.GetPixel(hdc, x, y) # & 0xFF # AND to isolate red portion
win32gui.ReleaseDC(0, hdc)
return color
def get_win_px(x=0, y=0, name="Device Manager"):
hwnd = win32gui.FindWindow(None, name) # warning no error handling
hdc = win32gui.GetWindowDC(hwnd)
color = win32gui.GetPixel(hdc, x, y)
win32gui.ReleaseDC(0, hdc)
return color
def press_win32(key_code=0x20):
win32api.keybd_event(key_code, 0, 0, 0)
win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def dpress_win32(key_code=0x20):
"""Include a delay between key down and key up events"""
win32api.keybd_event(key_code, 0, 0, 0)
time.sleep(0.05)
win32api.keybd_event(key_code, 0, win32con.KEYEVENTF_KEYUP, 0)
def pressed_win32(key_code=0x51): # 0x51 == q
return win32api.GetAsyncKeyState(key_code) & 0x8000
def get_px_pya(x=0, y=0):
return pyautogui.pixel(x, y) # [0] # (R, G, B)
def press_pya(key="space"):
pyautogui.press(key)
def pressed_kbd(key="q"):
return keyboard.is_pressed("q")
def detect_and_press(x=0, y=0):
try:
hdc = ctypes.windll.user32.GetDC(0)
while True:
if ctypes.windll.gdi32.GetPixel(hdc, x, y) & 0xFF > 150: # compare red value
win32api.keybd_event(0x20, 0, 0, 0)
win32api.keybd_event(0x20, 0, win32con.KEYEVENTF_KEYUP, 0)
except KeyboardInterrupt:
return
finally:
ctypes.windll.user32.ReleaseDC(0, hdc)
if __name__ == "__main__":
import timeit
loops = 100 # how many times to repeat the function call
multiplier = 1_000 / loops # convert to ms
for function in get_px_ctypes, get_px_win32, get_px_pya, get_win_px, press_win32, press_pya, dpress_win32, pressed_win32, pressed_kbd:
fn = function.__name__
print(f"{fn}:\t{timeit.timeit(f"{fn}()", setup=f"from __main__ import {fn}", number=loops)*multiplier:8,.3f} ms")
6 Comments
GetPixel incurs a full DWM composition pass. At least in theory, reading out of a window DC (as opposed to the desktop DC) could avoid the composition pass. Profiling this approach might make sense.HDC can also cause minor memory leaks, this is much more serious. A large number of leaked HDCs can slow down the system's drawing speed (as you've probably experienced). Generally, GetDC is very fast unless there are a large number of unreleased HDCs in the system. Therefore, in principle, HDCs should be released immediately after use. (There are many types of DCs; I'm referring specifically to Display Device Contexts here.)cProfile dump doesn't produce helpful insights. It just reinstates that Windows is event-driven, and applications spend most of their time waiting for events. We've known this for decades. The timeit-table is vastly more useful. Just add a test run that calls FindWindow with the HWND returned passed into GetDC (or GetWindowDC).GetDC(0) to GetWindowDC(hwnd), you have to adjust the coordinates passed to GetPixel() appropriately. Also, if you find yourself writing two or more calls to keybd_event, it's pretty safe that you should be calling SendInput instead (see this Q&A for rationale).Without having the minigame oursleves, it's hard to tell if proposed solutions will eventually work. Anyway, here is something to explore.
pyautogui is slow
As you can see in the documentation, a call to pyautogui.pixel() is actually just a wrapper. It's hiding the hideous fact that even though you are specifying the pixel of interest, the code takes a screenshot of the whole screen first. According to the documentation, this takes roughly 100ms, so it matches your results.
To avoid this, you can define a region of interest (see here):
REGION = (2400, 690, 1, 85) # x, y, width, height covering both pixels
# Pixel coordinates relative to the region
PIXEL_1_REL = (0, 0) # 2400, 690
PIXEL_2_REL = (0, 71) # 2400, 761
while not keyboard.is_pressed('q'): # This is considered more elegant than `==False`
screenshot = pyautogui.screenshot(region=REGION)
# Convert to numpy array for fast pixel access
img_array = np.array(screenshot)
# Check both pixels
pixel1_r = img_array[PIXEL_1_REL[1], PIXEL_1_REL[0], 0]
pixel2_r = img_array[PIXEL_2_REL[1], PIXEL_2_REL[0], 0]
if pixel1_r >= 150 or pixel2_r >= 150:
press_spacebar_fast()
time.sleep(0.001)
Please note the final line, which adds a little delay before restarting the loop. You should avoid looping infinitely through a loop with no pause, as it could cause 100% usage of CPU.
Additionally, if, as you say in a comment, having time.sleep() in your press_spacebar() function doesn't change anything, you should remove it.
Comments
It’s easy to get data into the GPU, but harder to get it out
Capturing the screen image is very slow because the CPU needs to perform a full rendering process to compose the screen image.
The solution is to capture the image for the window instead of the screen.(On my computer, the speed is 7ms vs 0.2ms, using C)
Code: (using C and winapi)
HWND hwnd = FindWindowW(ClassName,WindowName);
while (true) {
HDC hdc = GetDC(hwnd);
COLORREF color = GetPixel(hdc, x, y);
ReleaseDC(hwnd, hdc);
int red = GetRValue(color);
int green = GetGValue(color);
int blue = GetBValue(color);
Sleep(1);
}
Unfortunately, I don't know how to translate this into Python, but I'm sure AI can help you with this step.
If you need to retrieve pixels from multiple points simultaneously, you should use BitBlt instead of GetPixel; otherwise, you'll waste significant performance. Of course, this will be slightly more complex.
If you can't get the window image by using GetPixel/BitBlt (the pixels set by SetPixel won't be displayed on the screen), the simple solution is to disable hardware acceleration or dGPU. As for a more complex solution... I personally use WindowsGraphicsCapture in C++, but I don't know what to use in Python.
sleep()in yourpress_spacebarfunction? How did you pick those values?