setup_torch_profiler¶
- torchtune.training.setup_torch_profiler(enabled: bool = False, cpu: bool = True, cuda: bool = True, profile_memory: bool = False, with_stack: bool = False, record_shapes: bool = True, with_flops: bool = False, wait_steps: Optional[int] = None, warmup_steps: Optional[int] = None, active_steps: Optional[int] = None, num_cycles: Optional[int] = None, output_dir: Optional[str] = None) Tuple[profile, DictConfig] [source]¶
Sets up
profile
and returns the profiler config with post-setup updates.The profiler config can be provided in configs under the
profiler
key with the following layout:profiler: _component_: torchtune.training.setup_torch_profiler enabled: bool # Output directory of trace artifacts output_dir: str # torch.profiler.ProfilerActivity types to trace cpu: bool cuda: bool # Trace options profile_memory: bool with_stack: bool record_shapes: bool with_flops: bool # torch.profiler.schedule args wait_steps: int warmup_steps: int active_steps: int num_cycles: int
The profiler schedule updates with respect to an optimizer step (e.g., if
gradient_accumulation = 2
, then the profiler will step every 2 batches).Sensible defaults will be chosen if the config is missing options:
If no activities are specified, profiler will default to CPU + CUDA
If no schedule is specified, profiler will default to
DEFAULT_SCHEDULE
Certain options will be overridden (
with_stack
andrecord_shapes
) depending on requirements of other options (e.g.,profile_memory
requireswith_stack
andrecord_shapes
).
Note
Enabling the profiler will result in training speed reduction.
Setting
profile_memory: True
will generate large trace files.The profiler schedule is context dependent. Calling
profiler.step()
at each batch iteration but outside the gradient accumulation scope willstep
the profiler each forward / backward step. Callingprofiler.step()
each batch iteration but within the gradient accumulation scope willstep
the profiler each optimizer update step such that eachstep
contains multiple forward / backward passes.
- Parameters:
enabled (bool) – Enable pytorch profiler. Default is False.
cpu (bool) – Enable cpu profiling. Default is True.
cuda (bool) – Enable cuda profiling. Default is True.
profile_memory (bool) – Profile memory usage. Default is False.
with_stack (bool) – Profile stack. Default is False.
record_shapes (bool) – Record shapes. Default is True.
with_flops (bool) – Profile flops. Default is False.
wait_steps (Optional[int]) – Wait time in steps. Maps to
wait
kwarg oftorch.profiler.schedule
.warmup_steps (Optional[int]) – Warmup time in steps. Maps to
warmup
kwarg oftorch.profiler.schedule
.active_steps (Optional[int]) – Active time in steps. Maps to
active
kwarg oftorch.profiler.schedule
.num_cycles (Optional[int]) – Number of profiling cycles. Maps to
repeat
kwarg oftorch.profiler.schedule
.output_dir (Optional[str]) – Tracing file output path.
- Returns:
Tuple[torch.profiler.profile, DictConfig]