Shortcuts

torch.cuda

This package adds support for CUDA tensor types, that implement the same function as CPU tensors, but they utilize GPUs for computation.

It is lazily initialized, so you can always import it, and use is_available() to determine if your system supports CUDA.

CUDA semantics has more details about working with CUDA.

StreamContext

Context-manager that selects a given stream.

can_device_access_peer

Checks if peer access between two devices is possible.

current_blas_handle

Returns cublasHandle_t pointer to current cuBLAS handle

current_device

Returns the index of a currently selected device.

current_stream

Returns the currently selected Stream for a given device.

default_stream

Returns the default Stream for a given device.

device

Context-manager that changes the selected device.

device_count

Returns the number of GPUs available.

device_of

Context-manager that changes the current device to that of given object.

get_arch_list

Returns list CUDA architectures this library was compiled for.

get_device_capability

Gets the cuda capability of a device.

get_device_name

Gets the name of a device.

get_device_properties

Gets the properties of a device.

get_gencode_flags

Returns NVCC gencode flags this library was compiled with.

get_sync_debug_mode

Returns current value of debug mode for cuda synchronizing operations.

init

Initialize PyTorch’s CUDA state.

ipc_collect

Force collects GPU memory after it has been released by CUDA IPC.

is_available

Returns a bool indicating if CUDA is currently available.

is_initialized

Returns whether PyTorch’s CUDA state has been initialized.

set_device

Sets the current device.

set_stream

Sets the current stream.This is a wrapper API to set the stream.

set_sync_debug_mode

Sets the debug mode for cuda synchronizing operations.

stream

Wrapper around the Context-manager StreamContext that selects a given stream.

synchronize

Waits for all kernels in all streams on a CUDA device to complete.

Random Number Generator

get_rng_state

Returns the random number generator state of the specified GPU as a ByteTensor.

get_rng_state_all

Returns a list of ByteTensor representing the random number states of all devices.

set_rng_state

Sets the random number generator state of the specified GPU.

set_rng_state_all

Sets the random number generator state of all devices.

manual_seed

Sets the seed for generating random numbers for the current GPU.

manual_seed_all

Sets the seed for generating random numbers on all GPUs.

seed

Sets the seed for generating random numbers to a random number for the current GPU.

seed_all

Sets the seed for generating random numbers to a random number on all GPUs.

initial_seed

Returns the current random seed of the current GPU.

Communication collectives

comm.broadcast

Broadcasts a tensor to specified GPU devices.

comm.broadcast_coalesced

Broadcasts a sequence tensors to the specified GPUs.

comm.reduce_add

Sums tensors from multiple GPUs.

comm.scatter

Scatters tensor across multiple GPUs.

comm.gather

Gathers tensors from multiple GPU devices.

Streams and events

Stream

Wrapper around a CUDA stream.

Event

Wrapper around a CUDA event.

Graphs (beta)

graph_pool_handle

Returns an opaque token representing the id of a graph memory pool.

CUDAGraph

Wrapper around a CUDA graph.

graph

Context-manager that captures CUDA work into a torch.cuda.CUDAGraph object for later replay.

make_graphed_callables

Accepts callables (functions or nn.Modules) and returns graphed versions.

Memory management

empty_cache

Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi.

list_gpu_processes

Returns a human-readable printout of the running processes and their GPU memory use for a given device.

memory_stats

Returns a dictionary of CUDA memory allocator statistics for a given device.

memory_summary

Returns a human-readable printout of the current memory allocator statistics for a given device.

memory_snapshot

Returns a snapshot of the CUDA memory allocator state across all devices.

memory_allocated

Returns the current GPU memory occupied by tensors in bytes for a given device.

max_memory_allocated

Returns the maximum GPU memory occupied by tensors in bytes for a given device.

reset_max_memory_allocated

Resets the starting point in tracking maximum GPU memory occupied by tensors for a given device.

memory_reserved

Returns the current GPU memory managed by the caching allocator in bytes for a given device.

max_memory_reserved

Returns the maximum GPU memory managed by the caching allocator in bytes for a given device.

set_per_process_memory_fraction

Set memory fraction for a process.

memory_cached

Deprecated; see memory_reserved().

max_memory_cached

Deprecated; see max_memory_reserved().

reset_max_memory_cached

Resets the starting point in tracking maximum GPU memory managed by the caching allocator for a given device.

reset_peak_memory_stats

Resets the “peak” stats tracked by the CUDA memory allocator.

NVIDIA Tools Extension (NVTX)

nvtx.mark

Describe an instantaneous event that occurred at some point.

nvtx.range_push

Pushes a range onto a stack of nested range span.

nvtx.range_pop

Pops a range off of a stack of nested range spans.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources