Shortcuts

torchrec.sparse

Torchrec Jagged Tensors

It has 3 classes: JaggedTensor, KeyedJaggedTensor, KeyedTensor.

JaggedTensor

It represents an (optionally weighted) jagged tensor. A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor docstring for full example and further information.

KeyedJaggedTensor

KeyedJaggedTensor has additional “Key” information. Keyed on first dimesion, and jagged on last dimension. Please refer to KeyedJaggedTensor docstring for full example and further information.

KeyedTensor

KeyedTensor holds a concatenated list of dense tensors each of which can be accessed by a key. Keyed dimension can be variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions. Please refer to KeyedTensor docstring for full example and further information.

torchrec.sparse.jagged_tensor

class torchrec.sparse.jagged_tensor.JaggedTensor(*args, **kwargs)

Bases: torchrec.streamable.Pipelineable

Represents an (optionally weighted) jagged tensor

A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor for full example.

Implementation is torch.jit.script-able

Note that we will NOT do the input validation as it’s expensive, you should always pass in the valid lengths, offsets, etc.

Parameters
  • values (torch.Tensor) – values tensor in dense representation

  • weights (Optional[torch.Tensor]) – if values have weights. Tensor with same shape as values.

  • lengths (Optional[torch.Tensor]) – jagged slices, represented as lengths.

  • offsets (Optional[torch.Tensor]) – jagged slices, represented as cumulative offsets.

static empty(is_weighted: bool = False) torchrec.sparse.jagged_tensor.JaggedTensor
static from_dense(values: List[torch.Tensor], weights: Optional[List[torch.Tensor]] = None) torchrec.sparse.jagged_tensor.JaggedTensor

Constructs JaggedTensor from dense values/weights of shape (B, N,).

Note that lengths/offsets is still of shape (B,).

Parameters
  • values (List[torch.Tensor]) – a list of tensors for dense representation

  • weights (Optional[List[torch.Tensor]]) – if values have weights, tensor with same shape as values.

Returns

JaggedTensor created from 2d dense tensor

Return type

JaggedTensor

Example

values = [

torch.Tensor([1.0]), torch.Tensor(), torch.Tensor([7.0, 8.0]), torch.Tensor([10.0, 11.0, 12.0]), ]

weights = [

torch.Tensor([1.0]), torch.Tensor(), torch.Tensor([7.0, 8.0]), torch.Tensor([10.0, 11.0, 12.0]), ]

j1 = JaggedTensor.from_dense(

values=values, weights=weights, )

# j1 = [[1.0], [], [7.0], [8.0], [10.0, 11.0, 12.0]]

static from_dense_lengths(values: torch.Tensor, lengths: torch.Tensor, weights: Optional[torch.Tensor] = None) torchrec.sparse.jagged_tensor.JaggedTensor

Constructs JaggedTensor from dense values/weights of shape (B, N,).

Note that lengths is still of shape (B,).

lengths() torch.Tensor
offsets() torch.Tensor
record_stream(stream: torch.cuda.streams.Stream) None

See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html

to(device: torch.device, non_blocking: bool = False) torchrec.sparse.jagged_tensor.JaggedTensor

Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).

to_dense() List[torch.Tensor]

Constructs dense-reprensentation Tensor from JT .

Args:

Returns

list of tensors

Return type

List[torch.Tensor]

Example

values = torch.Tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]) offsets = torch.IntTensor([0, 2, 2, 3, 4, 5, 8]) jt = JaggedTensor(values=values, offsets=offsets)

torch_list = jt.to_dense()

# torch_list = [ # torch.tensor([1.0, 2.0]), # torch.tensor([]), # torch.tensor([3.0]), # torch.tensor([4.0]), # torch.tensor([5.0]), # torch.tensor([6.0, 7.0, 8.0]), # ]

to_padded_dense(desired_length: Optional[int] = None, padding_value: float = 0.0, pad_from_beginning: bool = True, chop_from_beginning: bool = True) torch.Tensor

Constructs 2d dense Tensor from JT to shape (B, N,). Note that B is the lengths of length N is the longest feature length or the assigned value if desired_length > length, we will use 0 or the padding_value to fill it up else, we will select the last desired_length values :param desired_length: the length of the tensor :type desired_length: int :param padding_value: padding value if we need to pad :type padding_value: float :param pad_from_beginning: if we need to pad from beginning :type pad_from_beginning: bool :param chop_from_beginning: if we need chop from beginning :type chop_from_beginning: bool

Returns

2d dense tensor

Return type

torch.Tensor

Example

values = torch.Tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]) offsets = torch.IntTensor([0, 2, 2, 3, 4, 5, 8]) jt = JaggedTensor(values=values, offsets=offsets)

t = jt.to_padded_dense(

desired_length=2, padding_value=10.0, pad_from_beginning=False, )

# t = [ # [1.0, 2.0], # [10.0, 10.0], # [3.0, 10.0], # [4.0, 10.0], # [5.0, 10.0], # [7.0, 8.0], # ] #

values() torch.Tensor
weights() torch.Tensor
weights_or_none() Optional[torch.Tensor]
class torchrec.sparse.jagged_tensor.JaggedTensorMeta(name, bases, namespace, **kwargs)

Bases: abc.ABCMeta, torch.fx._symbolic_trace.ProxyableClassMeta

class torchrec.sparse.jagged_tensor.KeyedJaggedTensor(*args, **kwargs)

Bases: torchrec.streamable.Pipelineable

Represents an (optionally weighted) keyed jagged tensor.

A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. Keyed on first dimension and jagged on the last dimension.

For example:

#              0       1        2  <-- dim_1
# "Feature0"   [V0,V1] None    [V2]
# "Feature1"   [V3]    [V4]    [V5,V6,V7]
#   ^
#  dim_0

dim_0: keyed dimension (ie. `Feature0`, `Feature1`)
dim_1: optional second dimension (ie. batch size)
dim_2: The jagged dimension which has slice lengths between 0-3 in the above example

We represent this data with following inputs:

values: torch.Tensor = [V0, V1, V2, V3, V4, V5, V6, V7], V == any tensor datatype
weights: torch.Tensor = [W0, W1, W2, W3, W4, W5, W6, W7], W == any tensor datatype
lengths: torch.Tensor = [2, 0, 1, 1, 1, 3], representing the jagged slice
offsets: torch.Tensor = [0, 2, 2, 3, 4, 5, 8], offsets from 0 for each jagged slice
keys: List[int] = ["Feature0", "Feature1"], which corresponds to each value of dim_0
index_per_key: Dict[str, int] = {"Feature0": 0, "Feature1": 1}, index for each key
offset_per_key: List[int] = [0, 3, 8], start offset for each key and final offset

Implementation is torch.jit.script-able

Parameters
  • keys (List[str]) – keys to the jagged Tensor.

  • values (torch.Tensor) – values tensor in dense representation.

  • weights (Optional[torch.Tensor]) – if values have weights. Tensor with same shape as values.

  • lengths (Optional[torch.Tensor]) – jagged slices, represented as lengths.

  • offsets (Optional[torch.Tensor]) – jagged slices, represented as cumulative offsets.

  • stride (Optional[int]) – number of examples per batch.

  • length_per_key (Optional[List[int]]) – start length for each key.

  • offset_per_key (Optional[List[int]]) – start offset for each key and final offset.

  • index_per_key (Optional[Dict[str, int]]) – index for each key.

  • jt_dict (Optional[Dict[str, JaggedTensor]]) –

static concat(a: torchrec.sparse.jagged_tensor.KeyedJaggedTensor, b: torchrec.sparse.jagged_tensor.KeyedJaggedTensor) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
device() torch.device
static empty(is_weighted: bool = False, device: Optional[torch.device] = None) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
static empty_like(kjt: torchrec.sparse.jagged_tensor.KeyedJaggedTensor) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
static from_lengths_sync(keys: List[str], values: torch.Tensor, lengths: torch.Tensor, weights: Optional[torch.Tensor] = None, stride: Optional[int] = None) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
static from_offsets_sync(keys: List[str], values: torch.Tensor, offsets: torch.Tensor, weights: Optional[torch.Tensor] = None, stride: Optional[int] = None) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
keys() List[str]
length_per_key() List[int]
lengths() torch.Tensor
offset_per_key() List[int]
offsets() torch.Tensor
permute(indices: List[int], indices_tensor: Optional[torch.Tensor] = None) torchrec.sparse.jagged_tensor.KeyedJaggedTensor
pin_memory() torchrec.sparse.jagged_tensor.KeyedJaggedTensor
record_stream(stream: torch.cuda.streams.Stream) None

See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html

split(segments: List[int]) List[torchrec.sparse.jagged_tensor.KeyedJaggedTensor]
stride() int
sync() torchrec.sparse.jagged_tensor.KeyedJaggedTensor
to(device: torch.device, non_blocking: bool = False) torchrec.sparse.jagged_tensor.KeyedJaggedTensor

Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).

to_dict() Dict[str, torchrec.sparse.jagged_tensor.JaggedTensor]
values() torch.Tensor
weights() torch.Tensor
weights_or_none() Optional[torch.Tensor]
class torchrec.sparse.jagged_tensor.KeyedTensor(*args, **kwargs)

Bases: torchrec.streamable.Pipelineable

KeyedTensor holds a concatenated list of dense tensors each of which can be accessed by a key. Keyed dimension can be variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions.

Parameters
  • keys (List[str]) – list of keys

  • length_per_key (List[int]) – length of each key along key dimension

  • values (torch.Tensor) – dense tensor, concatenated typically along key dimension

  • key_dim (int) – key dimension, zero indexed - defaults to 1 (typically B is 0-dimension)

Implementation is torch.jit.script-able

Example:

# kt is KeyedTensor holding

#                         0           1           2
#     "Embedding A"    [1,1]       [1,1]        [1,1]
#     "Embedding B"    [2,1,2]     [2,1,2]      [2,1,2]
#     "Embedding C"    [3,1,2,3]   [3,1,2,3]    [3,1,2,3]
# tensor_list = [
#         torch.tensor([[1,1]] * 3),
#         torch.tensor([[2,1,2]] * 3),
#         torch.tensor([[3,1,2,3]] * 3),
#     ]
keys = ["Embedding A", "Embedding B", "Embedding C"]
kt = KeyedTensor.from_tensor_list(keys, tensor_list)
kt.values()
    tensor([[1, 1, 2, 1, 2, 3, 1, 2, 3],
    [1, 1, 2, 1, 2, 3, 1, 2, 3],
    [1, 1, 2, 1, 2, 3, 1, 2, 3]])
kt["Embedding B"]
    tensor([[2, 1, 2],
    [2, 1, 2],
    [2, 1, 2]])
static from_tensor_list(keys: List[str], tensors: List[torch.Tensor], key_dim: int = 1, cat_dim: int = 1) torchrec.sparse.jagged_tensor.KeyedTensor
key_dim() int
keys() List[str]
length_per_key() List[int]
offset_per_key() List[int]
record_stream(stream: torch.cuda.streams.Stream) None

See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html

static regroup(keyed_tensors: List[torchrec.sparse.jagged_tensor.KeyedTensor], groups: List[List[str]]) List[torch.Tensor]
to(device: torch.device, non_blocking: bool = False) torchrec.sparse.jagged_tensor.KeyedTensor

Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).

to_dict() Dict[str, torch.Tensor]
values() torch.Tensor

Module contents

Torchrec Jagged Tensors

It has 3 classes: JaggedTensor, KeyedJaggedTensor, KeyedTensor.

JaggedTensor

It represents an (optionally weighted) jagged tensor. A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor docstring for full example and further information.

KeyedJaggedTensor

KeyedJaggedTensor has additional “Key” information. Keyed on first dimesion, and jagged on last dimension. Please refer to KeyedJaggedTensor docstring for full example and further information.

KeyedTensor

KeyedTensor holds a concatenated list of dense tensors each of which can be accessed by a key. Keyed dimension can be variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions. Please refer to KeyedTensor docstring for full example and further information.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources