torchtnt.utils.oom.attach_oom_observer¶
-
torchtnt.utils.oom.
attach_oom_observer
(output_dir: str, trace_max_entries: int = 1000000) None ¶ Attaches a function to record the PyTorch memory snapshot when an out of memory error occurs.
For more information, see this blog post .
Parameters: - output_dir (str) – The directory to save the memory snapshot.
- trace_max_entries (int, optional) – The maximum number of trace entries to record. Defaults to 1000000.
Note
Outputs are only saved if running on a host with CUDA devices available.