-
Notifications
You must be signed in to change notification settings - Fork 427
InputContainer.seek(backward=True, any_frame=False) overshoots #1982
-
I am trying to get a specific frame index from a video as fast as possible. The strategy I follow is what the docs of InputContainer.seek suggest - I call InputContainer.seek(backward=True, any_frame=False) so that I get the closest previous keyframe and then decode forward sequentially to the desired frame. I am using the following code
def decode_frame_from_video(video_path: str, frame_index: int) -> PIL.Image.Image: with av.open(video_path) as container: # av.container.input.InputContainer stream: av.video.stream.VideoStream = container.streams.video[0] # Calculate timestamp from frame index for seeking fps = float(stream.average_rate) target_timestamp_sec: float = frame_index / fps container.seek(int(target_timestamp_sec / stream.time_base), backward=True, any_frame=False, stream=stream) # Starting from the keyframe found by seek, decode frames until we reach the desired frame index for frame in container.decode(stream): # av.video.frame.VideoFrame index = round(frame.pts * stream.time_base * fps) if index == frame_index: image: PIL.Image.Image = frame.to_image() return image raise ValueError( f"Could not find frame at index {frame_index} in video of length " f"{stream.frames} frames at {video_path}" )
I am testing this with a video with 5 FPS and keyframes at indices 0, 20, 40, 60. All frames from 0 to 35 work fine, but when I pass frame_index=36, the first frame container.decode(stream) returns in the for loop after the call to container.seek is the keyframe at index 40. If I instead seek with container.seek(int(target_timestamp_sec * av.time_base), backward=True, any_frame=False), the error happens at index 19 and the keyframe returned is 20. The video is encoded with x265 and
stream.time_base = 1 / 10240. The frames also have the expected time and pts:
ipdb> for frame in container.decode(stream):
index = round(frame.pts * stream.time_base * fps)
print(index, frame.time, frame.pts)
0 0.0 0
1 0.2 2048
2 0.4 4096
3 0.6 6144
4 0.8 8192
5 1.0 10240
6 1.2 12288
7 1.4 14336
8 1.6 16384
9 1.8 18432
10 2.0 20480
11 2.2 22528
12 2.4 24576
13 2.6 26624
14 2.8 28672
15 3.0 30720
16 3.2 32768
17 3.4 34816
18 3.6 36864
19 3.8 38912
20 4.0 40960
21 4.2 43008
22 4.4 45056
23 4.6 47104
24 4.8 49152
25 5.0 51200
26 5.2 53248
27 5.4 55296
28 5.6 57344
29 5.8 59392
30 6.0 61440
31 6.2 63488
32 6.4 65536
33 6.6 67584
34 6.8 69632
35 7.0 71680
36 7.2 73728
37 7.4 75776
38 7.6 77824
39 7.8 79872
40 8.0 81920
Beta Was this translation helpful? Give feedback.