TBE CPU Autovectorization¶
FP8/16/32 Autovec Implementation Methods¶
-
template<typename InType, typename IndexType, typename OffsetType, typename OutType>
static bool EmbeddingSpMDM_autovec(const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)¶ Autovectorized version of method
EmbeddingSpMDM_ref
for FP32 weight type.- Template Parameters:
InType – input data type (
uint8_t
is used)IndexType – index data type (
int64_t
is used)OffsetType – offset data type (
int32_t
is used)OutType – output data type (
float
is used)
- Parameters:
block_size – Number of elements in a block (
int64_t
)output_size – Number of elements in output (
int64_t
)index_size – Number of elements in index (
int64_t
)data_size – Number of elements in data (
int64_t
)input – Address of input (
InType*
)indices – Address of index (
IndexType*
)offsets_or_lengths – Address of offset (
OffsetType*
)weights – Weights of sum; optional, can be null for non-weighted sum (
float*
)normalize_by_lengths – Whether or not to normalize by lengths (
bool
)out – Address of output (
OutType*
)is_weight_positional – If
true
, weight is positional; set tofalse
for FP32 autovec implementation (bool
)use_offsets – If
true
, will use offsets instead of lengths; set totrue
for FP32 autovec implementation (bool
)output_stride – If -1, output_stride is same as block_size; set to -1 for FP32 autovec implementation (
int64_t
)input_stride – If -1, input_stride is same as block_size; set to -1 for FP32 autovec implementation (
int64_t
)scale_bias_last – If
true
, scale and bias appear at end of each row; set totrue
for FP32 autovec implementation (bool
)no_bag – If
true
, no embedding bag; set tofalse
for FP32 autovec implementation (bool
)is_bf16_out – If
true
, output isBFLOAT16
type; set tofalse
for FP32 autovec implementation (bool
)is_bf16_in – If
true
, input isBFLOAT16
type; set tofalse
for FP32 autovec implementation (bool
)
-
template<typename IndexType, typename OffsetType, typename OutType>
static bool EmbeddingSpMDMFP8_autovec(const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)¶ Autovectorized version of method
EmbeddingSpMDM_ref
for FP8 weight type.- Template Parameters:
InType – input data type (
uint8_t
is used)IndexType – index data type (
int64_t
is used)OffsetType – offset data type (
int32_t
is used)OutType – output data type (
float
is used)
- Parameters:
block_size – Number of elements in a block (
int64_t
)output_size – Number of elements in output (
int64_t
)index_size – Number of elements in index (
int64_t
)data_size – Number of elements in data (
int64_t
)input – Address of input (
InType*
)indices – Address of index (
IndexType*
)offsets_or_lengths – Address of offset (
OffsetType*
)weights – Weights of sum; optional, can be null for non-weighted sum (
float*
)normalize_by_lengths – Whether or not to normalize by lengths (
bool
)out – Address of output (
OutType*
)is_weight_positional – If
true
, weight is positional; set tofalse
for FP8 autovec implementation (bool
)use_offsets – If
true
, will use offsets instead of lengths; set totrue
for FP8 autovec implementation (bool
)output_stride – If -1, output_stride is same as block_size; set to -1 for FP8 autovec implementation (
int64_t
)exponent_bits – Bits to use in exponent
exponent_bias – Bias to use in exponent
is_bf16_out – If
true
, output isBFLOAT16
type; set tofalse
for FP8 autovec implementation (bool
)