Shortcuts

(Prototype) MaskedTensor Sparsity

Created On: Oct 28, 2022 | Last Updated: Dec 12, 2023 | Last Verified: Not Verified

Before working on this tutorial, please make sure to review our MaskedTensor Overview tutorial <https://pytorch.org/tutorials/prototype/maskedtensor_overview.html>.

Introduction

Sparsity has been an area of rapid growth and importance within PyTorch; if any sparsity terms are confusing below, please refer to the sparsity tutorial for additional details.

Sparse storage formats have been proven to be powerful in a variety of ways. As a primer, the first use case most practitioners think about is when the majority of elements are equal to zero (a high degree of sparsity), but even in cases of lower sparsity, certain formats (e.g. BSR) can take advantage of substructures within a matrix.

Note

At the moment, MaskedTensor supports COO and CSR tensors with plans to support additional formats (such as BSR and CSC) in the future. If you have any requests for additional formats, please file a feature request here!

Principles

When creating a MaskedTensor with sparse tensors, there are a few principles that must be observed:

  1. data and mask must have the same storage format, whether that’s torch.strided, torch.sparse_coo, or torch.sparse_csr

  2. data and mask must have the same size, indicated by size()

Sparse COO tensors

In accordance with Principle #1, a sparse COO MaskedTensor is created by passing in two sparse COO tensors, which can be initialized by any of its constructors, for example torch.sparse_coo_tensor().

As a recap of sparse COO tensors, the COO format stands for “coordinate format”, where the specified elements are stored as tuples of their indices and the corresponding values. That is, the following are provided:

  • indices: array of size (ndim, nse) and dtype torch.int64

  • values: array of size (nse,) with any integer or floating point dtype

where ndim is the dimensionality of the tensor and nse is the number of specified elements.

For both sparse COO and CSR tensors, you can construct a MaskedTensor by doing either:

  1. masked_tensor(sparse_tensor_data, sparse_tensor_mask)

  2. dense_masked_tensor.to_sparse_coo() or dense_masked_tensor.to_sparse_csr()

The second method is easier to illustrate so we’ve shown that below, but for more on the first and the nuances behind the approach, please read the Sparse COO Appendix.

import torch
from torch.masked import masked_tensor
import warnings

# Disable prototype warnings and such
warnings.filterwarnings(action='ignore', category=UserWarning)

values = torch.tensor([[0, 0, 3], [4, 0, 5]])
mask = torch.tensor([[False, False, True], [False, False, True]])
mt = masked_tensor(values, mask)
sparse_coo_mt = mt.to_sparse_coo()

print("mt:\n", mt)
print("mt (sparse coo):\n", sparse_coo_mt)
print("mt data (sparse coo):\n", sparse_coo_mt.get_data())
mt:
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)
mt (sparse coo):
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)
mt data (sparse coo):
 tensor(indices=tensor([[0, 1],
                       [2, 2]]),
       values=tensor([3, 5]),
       size=(2, 3), nnz=2, layout=torch.sparse_coo)

Sparse CSR tensors

Similarly, MaskedTensor also supports the CSR (Compressed Sparse Row) sparse tensor format. Instead of storing the tuples of the indices like sparse COO tensors, sparse CSR tensors aim to decrease the memory requirements by storing compressed row indices. In particular, a CSR sparse tensor consists of three 1-D tensors:

  • crow_indices: array of compressed row indices with size (size[0] + 1,). This array indicates which row a given entry in values lives in. The last element is the number of specified elements, while crow_indices[i+1] - crow_indices[i] indicates the number of specified elements in row i.

  • col_indices: array of size (nnz,). Indicates the column indices for each value.

  • values: array of size (nnz,). Contains the values of the CSR tensor.

Of note, both sparse COO and CSR tensors are in a beta state.

By way of example:

mt_sparse_csr = mt.to_sparse_csr()

print("mt (sparse csr):\n", mt_sparse_csr)
print("mt data (sparse csr):\n", mt_sparse_csr.get_data())
mt (sparse csr):
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)
mt data (sparse csr):
 tensor(crow_indices=tensor([0, 1, 2]),
       col_indices=tensor([2, 2]),
       values=tensor([3, 5]), size=(2, 3), nnz=2, layout=torch.sparse_csr)

Supported Operations

Unary

All unary operators are supported, e.g.:

mt.sin()
MaskedTensor(
  [
    [      --,       --,   0.1411],
    [      --,       --,  -0.9589]
  ]
)

Binary

Binary operators are also supported, but the input masks from the two masked tensors must match. For more information on why this decision was made, please find our MaskedTensor: Advanced Semantics tutorial.

Please find an example below:

i = [[0, 1, 1],
     [2, 0, 2]]
v1 = [3, 4, 5]
v2 = [20, 30, 40]
m = torch.tensor([True, False, True])

s1 = torch.sparse_coo_tensor(i, v1, (2, 3))
s2 = torch.sparse_coo_tensor(i, v2, (2, 3))
mask = torch.sparse_coo_tensor(i, m, (2, 3))

mt1 = masked_tensor(s1, mask)
mt2 = masked_tensor(s2, mask)

print("mt1:\n", mt1)
print("mt2:\n", mt2)
mt1:
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)
mt2:
 MaskedTensor(
  [
    [      --,       --, 20],
    [      --,       --, 40]
  ]
)
print("torch.div(mt2, mt1):\n", torch.div(mt2, mt1))
print("torch.mul(mt1, mt2):\n", torch.mul(mt1, mt2))
torch.div(mt2, mt1):
 MaskedTensor(
  [
    [      --,       --,   6.6667],
    [      --,       --,   8.0000]
  ]
)
torch.mul(mt1, mt2):
 MaskedTensor(
  [
    [      --,       --, 60],
    [      --,       --, 200]
  ]
)

Reductions

Finally, reductions are supported:

mt
MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)
print("mt.sum():\n", mt.sum())
print("mt.sum(dim=1):\n", mt.sum(dim=1))
print("mt.amin():\n", mt.amin())
mt.sum():
 MaskedTensor(8, True)
mt.sum(dim=1):
 MaskedTensor(
  [3, 5]
)
mt.amin():
 MaskedTensor(3, True)

MaskedTensor Helper Methods

For convenience, MaskedTensor has a number of methods to help convert between the different layouts and identify the current layout:

Setup:

v = [[3, 0, 0],
     [0, 4, 5]]
m = [[True, False, False],
     [False, True, True]]

mt = masked_tensor(torch.tensor(v), torch.tensor(m))
mt
MaskedTensor(
  [
    [3,       --,       --],
    [      --, 4, 5]
  ]
)

MaskedTensor.to_sparse_coo() / MaskedTensor.to_sparse_csr() / MaskedTensor.to_dense() to help convert between the different layouts.

mt_sparse_coo = mt.to_sparse_coo()
mt_sparse_csr = mt.to_sparse_csr()
mt_dense = mt_sparse_coo.to_dense()

MaskedTensor.is_sparse() – this will check if the MaskedTensor’s layout matches any of the supported sparse layouts (currently COO and CSR).

print("mt_dense.is_sparse: ", mt_dense.is_sparse)
print("mt_sparse_coo.is_sparse: ", mt_sparse_coo.is_sparse)
print("mt_sparse_csr.is_sparse: ", mt_sparse_csr.is_sparse)
mt_dense.is_sparse:  False
mt_sparse_coo.is_sparse:  True
mt_sparse_csr.is_sparse:  True

MaskedTensor.is_sparse_coo()

print("mt_dense.is_sparse_coo(): ", mt_dense.is_sparse_coo())
print("mt_sparse_coo.is_sparse_coo: ", mt_sparse_coo.is_sparse_coo())
print("mt_sparse_csr.is_sparse_coo: ", mt_sparse_csr.is_sparse_coo())
mt_dense.is_sparse_coo():  False
mt_sparse_coo.is_sparse_coo:  True
mt_sparse_csr.is_sparse_coo:  False

MaskedTensor.is_sparse_csr()

print("mt_dense.is_sparse_csr(): ", mt_dense.is_sparse_csr())
print("mt_sparse_coo.is_sparse_csr: ", mt_sparse_coo.is_sparse_csr())
print("mt_sparse_csr.is_sparse_csr: ", mt_sparse_csr.is_sparse_csr())
mt_dense.is_sparse_csr():  False
mt_sparse_coo.is_sparse_csr:  False
mt_sparse_csr.is_sparse_csr:  True

Appendix

Sparse COO Construction

Recall in our original example, we created a MaskedTensor and then converted it to a sparse COO MaskedTensor with MaskedTensor.to_sparse_coo().

Alternatively, we can also construct a sparse COO MaskedTensor directly by passing in two sparse COO tensors:

values = torch.tensor([[0, 0, 3], [4, 0, 5]]).to_sparse()
mask = torch.tensor([[False, False, True], [False, False, True]]).to_sparse()
mt = masked_tensor(values, mask)

print("values:\n", values)
print("mask:\n", mask)
print("mt:\n", mt)
values:
 tensor(indices=tensor([[0, 1, 1],
                       [2, 0, 2]]),
       values=tensor([3, 4, 5]),
       size=(2, 3), nnz=3, layout=torch.sparse_coo)
mask:
 tensor(indices=tensor([[0, 1],
                       [2, 2]]),
       values=tensor([True, True]),
       size=(2, 3), nnz=2, layout=torch.sparse_coo)
mt:
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)

Instead of using torch.Tensor.to_sparse(), we can also create the sparse COO tensors directly, which brings us to a warning:

Warning

When using a function like MaskedTensor.to_sparse_coo() (analogous to Tensor.to_sparse()), if the user does not specify the indices like in the above example, then the 0 values will be “unspecified” by default.

Below, we explicitly specify the 0’s:

i = [[0, 1, 1],
     [2, 0, 2]]
v = [3, 4, 5]
m = torch.tensor([True, False, True])
values = torch.sparse_coo_tensor(i, v, (2, 3))
mask = torch.sparse_coo_tensor(i, m, (2, 3))
mt2 = masked_tensor(values, mask)

print("values:\n", values)
print("mask:\n", mask)
print("mt2:\n", mt2)
values:
 tensor(indices=tensor([[0, 1, 1],
                       [2, 0, 2]]),
       values=tensor([3, 4, 5]),
       size=(2, 3), nnz=3, layout=torch.sparse_coo)
mask:
 tensor(indices=tensor([[0, 1, 1],
                       [2, 0, 2]]),
       values=tensor([ True, False,  True]),
       size=(2, 3), nnz=3, layout=torch.sparse_coo)
mt2:
 MaskedTensor(
  [
    [      --,       --, 3],
    [      --,       --, 5]
  ]
)

Note that mt and mt2 look identical on the surface, and in the vast majority of operations, will yield the same result. But this brings us to a detail on the implementation:

data and mask – only for sparse MaskedTensors – can have a different number of elements (nnz()) at creation, but the indices of mask must then be a subset of the indices of data. In this case, data will assume the shape of mask by data = data.sparse_mask(mask); in other words, any of the elements in data that are not True in mask (that is, not specified) will be thrown away.

Therefore, under the hood, the data looks slightly different; mt2 has the “4” value masked out and mt is completely without it. Their underlying data has different shapes, which would make operations like mt + mt2 invalid.

print("mt data:\n", mt.get_data())
print("mt2 data:\n", mt2.get_data())
mt data:
 tensor(indices=tensor([[0, 1],
                       [2, 2]]),
       values=tensor([3, 5]),
       size=(2, 3), nnz=2, layout=torch.sparse_coo)
mt2 data:
 tensor(indices=tensor([[0, 1, 1],
                       [2, 0, 2]]),
       values=tensor([3, 4, 5]),
       size=(2, 3), nnz=3, layout=torch.sparse_coo)

Sparse CSR Construction

We can also construct a sparse CSR MaskedTensor using sparse CSR tensors, and like the example above, this results in a similar treatment under the hood.

crow_indices = torch.tensor([0, 2, 4])
col_indices = torch.tensor([0, 1, 0, 1])
values = torch.tensor([1, 2, 3, 4])
mask_values = torch.tensor([True, False, False, True])

csr = torch.sparse_csr_tensor(crow_indices, col_indices, values, dtype=torch.double)
mask = torch.sparse_csr_tensor(crow_indices, col_indices, mask_values, dtype=torch.bool)
mt = masked_tensor(csr, mask)

print("mt:\n", mt)
print("mt data:\n", mt.get_data())
mt:
 MaskedTensor(
  [
    [  1.0000,       --],
    [      --,   4.0000]
  ]
)
mt data:
 tensor(crow_indices=tensor([0, 2, 4]),
       col_indices=tensor([0, 1, 0, 1]),
       values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4,
       dtype=torch.float64, layout=torch.sparse_csr)

Conclusion

In this tutorial, we have introduced how to use MaskedTensor with sparse COO and CSR formats and discussed some of the subtleties under the hood in case users decide to access the underlying data structures directly. Sparse storage formats and masked semantics indeed have strong synergies, so much so that they are sometimes used as proxies for each other (as we will see in the next tutorial). In the future, we certainly plan to invest and continue developing in this direction.

Further Reading

To continue learning more, you can find our Efficiently writing “sparse” semantics for Adagrad with MaskedTensor tutorial to see an example of how MaskedTensor can simplify existing workflows with native masking semantics.

Total running time of the script: ( 0 minutes 0.044 seconds)

Gallery generated by Sphinx-Gallery

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources