CUDA Memory Operators¶
-
Tensor new_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
Allocate an
at::Tensor
with unified managed memory (UVM). Then set its preferred storage location to CPU (host memory) and establish mappings on the CUDA device to the host memory.- Parameters:
self – The input tensor
sizes – The target tensor dimensions
- Returns:
A new tensor backed by UVM
-
Tensor new_managed_tensor_meta(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
Placeholder operator for the
Meta
dispatch key.- Parameters:
self – The input tensor
sizes – The target tensor dimensions
- Returns:
A new empty tensor
-
Tensor new_host_mapped_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
Allocate the
at::Tensor
with host-mapped memory.- Parameters:
self – The input tensor
sizes – The target tensor dimensions
- Returns:
A new tensor backed by host-mapped memory
-
Tensor new_unified_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes, bool is_host_mapped)¶
Allocate the
at::Tensor
with either unified managed memory (UVM) or host-mapped memory.- Parameters:
self – The input tensor
sizes – The target tensor dimensions
is_host_mapped – Whether to allocate UVM or host-mapped memory
- Returns:
A new tensor backed by UVM or host-mapped memory, depending on the value of
is_host_mapped
-
Tensor new_unified_tensor_meta(const Tensor &self, const std::vector<std::int64_t> &sizes, bool is_host_mapped)¶
Placeholder operator for the
Meta
dispatch key for new_unified_tensor- Parameters:
self – The input tensor
sizes – The target tensor dimensions
is_host_mapped – Whether to allocate UVM or host-mapped memory
- Returns:
A new tensor backed by UVM or host-mapped memory, depending on the value of
is_host_mapped
-
Tensor new_vanilla_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)¶
Allocate an
at::Tensor
with unified managed memory (UVM), but allow for its preferred storage location to be automatically managed.- Parameters:
self – The input tensor
sizes – The target tensor dimensions
- Returns:
A new tensor backed by UVM
-
bool uvm_storage(const Tensor &self)¶
Check if a tensor is allocated with UVM (either CPU or GPU tensor).
- Parameters:
self – The input tensor
- Returns:
true
if the tensor is allocated with UVM, otherwisefalse
-
bool is_uvm_tensor(const Tensor &self)¶
Check if a tensor is allocated with UVM, BUT is not a CPU tensor.
- Parameters:
self – The input tensor
- Returns:
true
if the tensor is a non-CPU tensor allocated with UVM, otherwisefalse
-
Tensor uvm_to_cpu(const Tensor &self)¶
Convert a UVM tensor to a CPU tensor.
- Parameters:
self – The input tensor
- Returns:
A new tensor that is effectively the input moved from UVM to CPU
-
Tensor uvm_to_device(const Tensor &self, const Tensor &prototype)¶
Create a new UVM tensor that shares the same device and UVM storage with
prototype
.- Parameters:
self – The input tensor
prototype – The target tensor whose device and and UVM storage will be shared with the new tensor
- Returns:
A new tensor that shares the same device and UVM storage with
prototype
.
-
void uvm_cuda_mem_advise(const Tensor &self, int64_t cuda_memory_advise)¶
Call
cudaMemAdvise()
on a UVM tensor’s storage. ThecudaMemoryAdvise
enum is available on the Python side in thefbgemm_gpu.uvm
namespace; see the documentation over there for valid values.See also
See here for more information on the
cudaMemoryAdvise
enum.- Parameters:
self – The input tensor
cuda_memory_advise – The
cudaMemoryAdvise
enum value, as integer
-
void uvm_cuda_mem_prefetch_async(const Tensor &self, std::optional<Tensor> device_t)¶
Call
cudaMemPrefetchAsync()
on a UVM tensor’s storage to prefetch memory to a destination device.See also
See here for more information on
cudaMemPrefetchAsync()
.- Parameters:
self – The input tensor
device_t – [OPTIONAL] The tensor whose device will be the prefetch destination
-
void uvm_mem_advice_dont_fork(const Tensor &self)¶
Call
madvise(...MADV_DONTFORK)
on a UVM tensor’s storage. This is a workaround for an issue where the UVM kernel driver un-maps UVM storage pages from the page table on fork, causing slowdown on the next access from a CPU.See also
See here for more information on
madvise()
.- Parameters:
self – The input tensor
-
Tensor uvm_to_cpu_clone(const Tensor &self)¶
Copy a UVM tensor’s contiguous storage (uvm_storage(t) is true) into a new CPU Tensor. The copy operation uses single-threaded
memcpy()
.- Parameters:
self – The input tensor
- Returns:
A new CPU tensor containing the data copied from the UVM tensor