精確與近似搜尋模式：效能與準確性比較¶

在本示例中，我們將介紹 VideoDecoder 類中 seek_mode 引數。此引數在 VideoDecoder 建立速度與檢索幀的搜尋準確性之間進行權衡（即在近似模式下，請求第 i 幀不一定返回第 i 幀）。

首先，一些樣板程式碼：我們將從網上下載一個短影片，並使用 ffmpeg CLI 將其重複 100 次。最終我們會得到兩個影片：一個大約 13 秒的短影片和一個大約 20 分鐘的長影片。您可以忽略這部分，直接跳轉到效能：VideoDecoder 建立。

import torch
import requests
import tempfile
from pathlib import Path
import shutil
import subprocess
from time import perf_counter_ns


# Video source: https://www.pexels.com/video/dog-eating-854132/
# License: CC0. Author: Coverr.
url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
    raise RuntimeError(f"Failed to download video. {response.status_code = }.")

temp_dir = tempfile.mkdtemp()
short_video_path = Path(temp_dir) / "short_video.mp4"
with open(short_video_path, 'wb') as f:
    for chunk in response.iter_content():
        f.write(chunk)

long_video_path = Path(temp_dir) / "long_video.mp4"
ffmpeg_command = [
    "ffmpeg",
    "-stream_loop", "99",  # repeat video 100 times
    "-i", f"{short_video_path}",
    "-c", "copy",
    f"{long_video_path}"
]
subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

from torchcodec.decoders import VideoDecoder
print(f"Short video duration: {VideoDecoder(short_video_path).metadata.duration_seconds} seconds")
print(f"Long video duration: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} minutes")

Short video duration: 13.8 seconds
Long video duration: 23.0 minutes

效能：`VideoDecoder` 建立¶

從效能角度來看，seek_mode 引數最終影響的是 VideoDecoder 物件的**建立**。影片越長，效能提升越高。

def bench(f, average_over=50, warmup=2, **f_kwargs):

    for _ in range(warmup):
        f(**f_kwargs)

    times = []
    for _ in range(average_over):
        start = perf_counter_ns()
        f(**f_kwargs)
        end = perf_counter_ns()
        times.append(end - start)

    times = torch.tensor(times) * 1e-6  # ns to ms
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}ms +- {std:.2f}")


print("Creating a VideoDecoder object with seek_mode='exact' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="approximate")
print()
print("Creating a VideoDecoder object with seek_mode='exact' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="approximate")

Creating a VideoDecoder object with seek_mode='exact' on a short video:
med = 8.06ms +- 0.02
Creating a VideoDecoder object with seek_mode='approximate' on a short video:
med = 7.08ms +- 0.02

Creating a VideoDecoder object with seek_mode='exact' on a long video:
med = 114.17ms +- 1.21
Creating a VideoDecoder object with seek_mode='approximate' on a long video:
med = 10.50ms +- 0.03

效能：幀解碼和剪輯取樣¶

嚴格來說，seek_mode 引數僅影響 VideoDecoder 建立的效能。它不直接影響幀解碼或採樣的效能。**但是**，由於幀解碼和取樣模式通常涉及 VideoDecoder 的建立（每個影片一個），seek_mode 最終可能會影響解碼器和取樣器的效能。例如

from torchcodec import samplers


def sample_clips(seek_mode):
    return samplers.clips_at_random_indices(
        decoder=VideoDecoder(
            source=long_video_path,
            seek_mode=seek_mode
        ),
        num_clips=5,
        num_frames_per_clip=2,
    )


print("Sampling clips with seek_mode='exact':")
bench(sample_clips, seek_mode="exact")
print("Sampling clips with seek_mode='approximate':")
bench(sample_clips, seek_mode="approximate")

Sampling clips with seek_mode='exact':
med = 302.87ms +- 35.44
Sampling clips with seek_mode='approximate':
med = 182.62ms +- 54.41

準確性：元資料和幀檢索¶

我們已經看到，使用 seek_mode="approximate" 可以顯著加快 VideoDecoder 的建立。為此付出的代價是，搜尋的準確性可能不如使用 seek_mode="exact"。它也可能影響元資料的精確性。

然而，在許多情況下，您會發現兩種模式之間沒有準確性差異，這意味著 seek_mode="approximate" 是淨收益

print("Metadata of short video with seek_mode='exact':")
print(VideoDecoder(short_video_path, seek_mode="exact").metadata)
print("Metadata of short video with seek_mode='approximate':")
print(VideoDecoder(short_video_path, seek_mode="approximate").metadata)

exact_decoder = VideoDecoder(short_video_path, seek_mode="exact")
approx_decoder = VideoDecoder(short_video_path, seek_mode="approximate")
for i in range(len(exact_decoder)):
    torch.testing.assert_close(
        exact_decoder.get_frame_at(i).data,
        approx_decoder.get_frame_at(i).data,
        atol=0, rtol=0,
    )
print("Frame seeking is the same for this video!")

Metadata of short video with seek_mode='exact':
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: 0.0
  end_stream_seconds_from_content: 13.8
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: 345
  average_fps_from_header: 25.0
  pixel_aspect_ratio: 1
  duration_seconds: 13.8
  begin_stream_seconds: 0.0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Metadata of short video with seek_mode='approximate':
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: None
  end_stream_seconds_from_content: None
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: None
  average_fps_from_header: 25.0
  pixel_aspect_ratio: 1
  duration_seconds: 13.8
  begin_stream_seconds: 0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Frame seeking is the same for this video!

它在幕後做了什麼？¶

使用 seek_mode="exact" 時，VideoDecoder 在例項化時會執行一次掃描。掃描不涉及解碼，而是處理整個檔案以推斷更準確的元資料（如持續時間），並構建內部幀和關鍵幀索引。這個內部索引可能比檔案頭中的索引更準確，從而帶來更準確的搜尋行為。沒有掃描時，TorchCodec 只依賴檔案中包含的元資料，這可能並不總是那麼準確。

我應該使用哪種模式？¶

總的經驗法則是：

如果您非常關心幀搜尋的準確性，請使用“exact”。
如果您可以犧牲搜尋準確性來換取速度，這通常是在進行剪輯取樣時，請使用“approximate”。
如果您的影片沒有可變幀率且元資料正確，那麼“approximate”模式是淨收益：它將與“exact”模式一樣準確，同時速度更快。

shutil.rmtree(temp_dir)

指令碼總執行時間： (0 分 35.314 秒)

由 Sphinx-Gallery 生成的畫廊

精確與近似搜尋模式：效能與準確性比較¶

效能：`VideoDecoder` 建立¶

效能：幀解碼和剪輯取樣¶

準確性：元資料和幀檢索¶

它在幕後做了什麼？¶

我應該使用哪種模式？¶

文件

教程

資源

精確與近似搜尋模式：效能與準確性比較¶

效能：VideoDecoder 建立¶

效能：幀解碼和剪輯取樣¶

準確性：元資料和幀檢索¶

它在幕後做了什麼？¶

我應該使用哪種模式？¶

文件

教程

資源

效能：`VideoDecoder` 建立¶