TBE CPU Autovectorization¶

FP8/16/32 Autovec Implementation Methods¶

template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)

Autovectorized version of method EmbeddingSpMDM_ref for FP32 weight type.

Template Parameters:

InType – input data type (uint8_t is used)
IndexType – index data type (int64_t is used)
OffsetType – offset data type (int32_t is used)
OutType – output data type (float is used)

Parameters:

block_size – Number of elements in a block (int64_t)
output_size – Number of elements in output (int64_t)
index_size – Number of elements in index (int64_t)
data_size – Number of elements in data (int64_t)
input – Address of input (InType*)
indices – Address of index (IndexType*)
offsets_or_lengths – Address of offset (OffsetType*)
weights – Weights of sum; optional, can be null for non-weighted sum (float*)
normalize_by_lengths – Whether or not to normalize by lengths (bool)
out – Address of output (OutType*)
is_weight_positional – If true, weight is positional; set to false for FP32 autovec implementation (bool)
use_offsets – If true, will use offsets instead of lengths; set to true for FP32 autovec implementation (bool)
output_stride – If -1, output_stride is same as block_size; set to -1 for FP32 autovec implementation (int64_t)
input_stride – If -1, input_stride is same as block_size; set to -1 for FP32 autovec implementation (int64_t)
scale_bias_last – If true, scale and bias appear at end of each row; set to true for FP32 autovec implementation (bool)
no_bag – If true, no embedding bag; set to false for FP32 autovec implementation (bool)
is_bf16_out – If true, output is BFLOAT16 type; set to false for FP32 autovec implementation (bool)
is_bf16_in – If true, input is BFLOAT16 type; set to false for FP32 autovec implementation (bool)

template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)

Autovectorized version of method EmbeddingSpMDM_ref for FP8 weight type.

Template Parameters:

InType – input data type (uint8_t is used)
IndexType – index data type (int64_t is used)
OffsetType – offset data type (int32_t is used)
OutType – output data type (float is used)

Parameters:

block_size – Number of elements in a block (int64_t)
output_size – Number of elements in output (int64_t)
index_size – Number of elements in index (int64_t)
data_size – Number of elements in data (int64_t)
input – Address of input (InType*)
indices – Address of index (IndexType*)
offsets_or_lengths – Address of offset (OffsetType*)
weights – Weights of sum; optional, can be null for non-weighted sum (float*)
normalize_by_lengths – Whether or not to normalize by lengths (bool)
out – Address of output (OutType*)
is_weight_positional – If true, weight is positional; set to false for FP8 autovec implementation (bool)
use_offsets – If true, will use offsets instead of lengths; set to true for FP8 autovec implementation (bool)
output_stride – If -1, output_stride is same as block_size; set to -1 for FP8 autovec implementation (int64_t)
exponent_bits – Bits to use in exponent
exponent_bias – Bias to use in exponent
is_bf16_out – If true, output is BFLOAT16 type; set to false for FP8 autovec implementation (bool)

TBE CPU Autovectorization¶

FP8/16/32 Autovec Implementation Methods¶

Docs

Tutorials

Resources