torchrec.sparse¶
Torchrec Jagged Tensors
It has 3 classes: JaggedTensor, KeyedJaggedTensor, KeyedTensor.
JaggedTensor
It represents an (optionally weighted) jagged tensor. A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor docstring for full example and further information.
KeyedJaggedTensor
KeyedJaggedTensor has additional “Key” information. Keyed on first dimesion, and jagged on last dimension. Please refer to KeyedJaggedTensor docstring for full example and further information.
KeyedTensor
KeyedTensor holds a concatenated list of dense tensors each of which can be accessed by a key. Keyed dimension can be variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions. Please refer to KeyedTensor docstring for full example and further information.
torchrec.sparse.jagged_tensor¶
- class torchrec.sparse.jagged_tensor.JaggedTensor(*args, **kwargs)¶
Bases:
Pipelineable
Represents an (optionally weighted) jagged tensor.
A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor for full example.
Implementation is torch.jit.script-able.
Note
We will NOT do input validation as it’s expensive, you should always pass in the valid lengths, offsets, etc.
- Parameters:
values (torch.Tensor) – values tensor in dense representation.
weights (Optional[torch.Tensor]) – if values have weights. Tensor with same shape as values.
lengths (Optional[torch.Tensor]) – jagged slices, represented as lengths.
offsets (Optional[torch.Tensor]) – jagged slices, represented as cumulative offsets.
- static empty(is_weighted: bool = False) JaggedTensor ¶
- static from_dense(values: List[Tensor], weights: Optional[List[Tensor]] = None) JaggedTensor ¶
Constructs JaggedTensor from dense values/weights of shape (B, N,).
Note that lengths and offsets are still of shape (B,).
- Parameters:
values (List[torch.Tensor]) – a list of tensors for dense representation
weights (Optional[List[torch.Tensor]]) – if values have weights, tensor with the same shape as values.
- Returns:
JaggedTensor created from 2D dense tensor.
- Return type:
Example:
values = [ torch.Tensor([1.0]), torch.Tensor(), torch.Tensor([7.0, 8.0]), torch.Tensor([10.0, 11.0, 12.0]), ] weights = [ torch.Tensor([1.0]), torch.Tensor(), torch.Tensor([7.0, 8.0]), torch.Tensor([10.0, 11.0, 12.0]), ] j1 = JaggedTensor.from_dense( values=values, weights=weights, ) # j1 = [[1.0], [], [7.0], [8.0], [10.0, 11.0, 12.0]]
- static from_dense_lengths(values: Tensor, lengths: Tensor, weights: Optional[Tensor] = None) JaggedTensor ¶
Constructs JaggedTensor from dense values/weights of shape (B, N,).
Note that lengths is still of shape (B,).
- lengths() Tensor ¶
- lengths_or_none() Optional[Tensor] ¶
- offsets() Tensor ¶
- offsets_or_none() Optional[Tensor] ¶
- record_stream(stream: Stream) None ¶
See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html
- to(device: device, non_blocking: bool = False) JaggedTensor ¶
Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).
- to_dense() List[Tensor] ¶
Constructs dense-reprensentation tensor from JT.
- Returns:
list of tensors.
- Return type:
List[torch.Tensor]
Example:
values = torch.Tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]) offsets = torch.IntTensor([0, 2, 2, 3, 4, 5, 8]) jt = JaggedTensor(values=values, offsets=offsets) torch_list = jt.to_dense() # torch_list = [ # torch.tensor([1.0, 2.0]), # torch.tensor([]), # torch.tensor([3.0]), # torch.tensor([4.0]), # torch.tensor([5.0]), # torch.tensor([6.0, 7.0, 8.0]), # ]
- to_padded_dense(desired_length: Optional[int] = None, padding_value: float = 0.0) Tensor ¶
Constructs 2D dense Tensor from JT to shape (B, N,).
Note that B is the length of self.lengths() and N is the longest feature length or desired_length.
If desired_length > length we will pad with padding_value, otherwise we will select the last value at desired_length.
- Parameters:
desired_length (int) – the length of the tensor.
padding_value (float) – padding value if we need to pad.
- Returns:
2d dense tensor.
- Return type:
torch.Tensor
Example:
values = torch.Tensor([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]) offsets = torch.IntTensor([0, 2, 2, 3, 4, 5, 8]) jt = JaggedTensor(values=values, offsets=offsets) dt = jt.to_padded_dense( desired_length=2, padding_value=10.0, ) # dt = [ # [1.0, 2.0], # [10.0, 10.0], # [3.0, 10.0], # [4.0, 10.0], # [5.0, 10.0], # [7.0, 8.0], # ]
- values() Tensor ¶
- weights() Tensor ¶
- weights_or_none() Optional[Tensor] ¶
- class torchrec.sparse.jagged_tensor.JaggedTensorMeta(name, bases, namespace, **kwargs)¶
Bases:
ABCMeta
,ProxyableClassMeta
- class torchrec.sparse.jagged_tensor.KeyedJaggedTensor(*args, **kwargs)¶
Bases:
Pipelineable
Represents an (optionally weighted) keyed jagged tensor.
A KeyedJaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. Keyed on first dimension and jagged on the last dimension.
Implementation is torch.jit.script-able.
- Parameters:
keys (List[str]) – keys to the jagged Tensor.
values (torch.Tensor) – values tensor in dense representation.
weights (Optional[torch.Tensor]) – if the values have weights. Tensor with the same shape as values.
lengths (Optional[torch.Tensor]) – jagged slices, represented as lengths.
offsets (Optional[torch.Tensor]) – jagged slices, represented as cumulative offsets.
stride (Optional[int]) – number of examples per batch.
length_per_key (Optional[List[int]]) – start length for each key.
offset_per_key (Optional[List[int]]) – start offset for each key and final offset.
index_per_key (Optional[Dict[str, int]]) – index for each key.
jt_dict (Optional[Dict[str, JaggedTensor]]) –
Example:
# 0 1 2 <-- dim_1 # "Feature0" [V0,V1] None [V2] # "Feature1" [V3] [V4] [V5,V6,V7] # ^ # dim_0 dim_0: keyed dimension (ie. `Feature0`, `Feature1`) dim_1: optional second dimension (ie. batch size) dim_2: The jagged dimension which has slice lengths between 0-3 in the above example # We represent this data with following inputs: values: torch.Tensor = [V0, V1, V2, V3, V4, V5, V6, V7] # V == any tensor datatype weights: torch.Tensor = [W0, W1, W2, W3, W4, W5, W6, W7] # W == any tensor datatype lengths: torch.Tensor = [2, 0, 1, 1, 1, 3] # representing the jagged slice offsets: torch.Tensor = [0, 2, 2, 3, 4, 5, 8] # offsets from 0 for each jagged slice keys: List[str] = ["Feature0", "Feature1"] # correspond to each value of dim_0 index_per_key: Dict[str, int] = {"Feature0": 0, "Feature1": 1} # index for each key offset_per_key: List[int] = [0, 3, 8] # start offset for each key and final offset
- static concat(kjt_list: List[KeyedJaggedTensor]) KeyedJaggedTensor ¶
- device() device ¶
- static empty(is_weighted: bool = False, device: Optional[device] = None) KeyedJaggedTensor ¶
- static empty_like(kjt: KeyedJaggedTensor) KeyedJaggedTensor ¶
- static from_lengths_sync(keys: List[str], values: Tensor, lengths: Tensor, weights: Optional[Tensor] = None, stride: Optional[int] = None) KeyedJaggedTensor ¶
- static from_offsets_sync(keys: List[str], values: Tensor, offsets: Tensor, weights: Optional[Tensor] = None, stride: Optional[int] = None) KeyedJaggedTensor ¶
- keys() List[str] ¶
- length_per_key() List[int] ¶
- length_per_key_or_none() Optional[List[int]] ¶
- lengths() Tensor ¶
- lengths_or_none() Optional[Tensor] ¶
- offset_per_key() List[int] ¶
- offset_per_key_or_none() Optional[List[int]] ¶
- offsets() Tensor ¶
- offsets_or_none() Optional[Tensor] ¶
- permute(indices: List[int], indices_tensor: Optional[Tensor] = None) KeyedJaggedTensor ¶
- pin_memory() KeyedJaggedTensor ¶
- record_stream(stream: Stream) None ¶
See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html
- split(segments: List[int]) List[KeyedJaggedTensor] ¶
- stride() int ¶
- sync() KeyedJaggedTensor ¶
- to(device: device, non_blocking: bool = False) KeyedJaggedTensor ¶
Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).
- to_dict() Dict[str, JaggedTensor] ¶
- values() Tensor ¶
- weights() Tensor ¶
- weights_or_none() Optional[Tensor] ¶
- class torchrec.sparse.jagged_tensor.KeyedTensor(*args, **kwargs)¶
Bases:
Pipelineable
KeyedTensor holds a concatenated list of dense tensors, each of which can be accessed by a key.
The keyed dimension can be of variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions.
Implementation is torch.jit.script-able.
- Parameters:
keys (List[str]) – list of keys.
length_per_key (List[int]) – length of each key along key dimension.
values (torch.Tensor) – dense tensor, concatenated typically along key dimension.
key_dim (int) – key dimension, zero indexed - defaults to 1 (typically B is 0-dimension).
Example:
# kt is KeyedTensor holding # 0 1 2 # "Embedding A" [1,1] [1,1] [1,1] # "Embedding B" [2,1,2] [2,1,2] [2,1,2] # "Embedding C" [3,1,2,3] [3,1,2,3] [3,1,2,3] tensor_list = [ torch.tensor([[1,1]] * 3), torch.tensor([[2,1,2]] * 3), torch.tensor([[3,1,2,3]] * 3), ] keys = ["Embedding A", "Embedding B", "Embedding C"] kt = KeyedTensor.from_tensor_list(keys, tensor_list) kt.values() # tensor( # [ # [1, 1, 2, 1, 2, 3, 1, 2, 3], # [1, 1, 2, 1, 2, 3, 1, 2, 3], # [1, 1, 2, 1, 2, 3, 1, 2, 3], # ] # ) kt["Embedding B"] # tensor([[2, 1, 2], [2, 1, 2], [2, 1, 2]])
- static from_tensor_list(keys: List[str], tensors: List[Tensor], key_dim: int = 1, cat_dim: int = 1) KeyedTensor ¶
- key_dim() int ¶
- keys() List[str] ¶
- length_per_key() List[int] ¶
- offset_per_key() List[int] ¶
- record_stream(stream: Stream) None ¶
See https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html
- static regroup(keyed_tensors: List[KeyedTensor], groups: List[List[str]]) List[Tensor] ¶
- static regroup_as_dict(keyed_tensors: List[KeyedTensor], groups: List[List[str]], keys: List[str]) Dict[str, Tensor] ¶
- to(device: device, non_blocking: bool = False) KeyedTensor ¶
Please be aware that accoarding to https://pytorch.org/docs/stable/generated/torch.Tensor.to.html, to might return self or a copy of self. So please remember to use to with the assignment operator, for example, in = in.to(new_device).
- to_dict() Dict[str, Tensor] ¶
- values() Tensor ¶
Module contents¶
Torchrec Jagged Tensors
It has 3 classes: JaggedTensor, KeyedJaggedTensor, KeyedTensor.
JaggedTensor
It represents an (optionally weighted) jagged tensor. A JaggedTensor is a tensor with a jagged dimension which is dimension whose slices may be of different lengths. See KeyedJaggedTensor docstring for full example and further information.
KeyedJaggedTensor
KeyedJaggedTensor has additional “Key” information. Keyed on first dimesion, and jagged on last dimension. Please refer to KeyedJaggedTensor docstring for full example and further information.
KeyedTensor
KeyedTensor holds a concatenated list of dense tensors each of which can be accessed by a key. Keyed dimension can be variable length (length_per_key). Common use cases uses include storage of pooled embeddings of different dimensions. Please refer to KeyedTensor docstring for full example and further information.