This API is in beta and may change in the near future.
Torch mobile supports
torch.mobile_optimizer.optimize_for_mobile utility to run a list of optimization pass with modules in eval mode.
The method takes the following parameters: a torch.jit.ScriptModule object, a blocklisting optimization set and a preserved method list
- By default, if optimization blocklist is None or empty,
optimize_for_mobilewill run the following optimizations:
Conv2D + BatchNorm fusion (blocklisting option MobileOptimizerType::CONV_BN_FUSION): This optimization pass folds
forwardmethod of this module and all its submodules. The weight and bias of the
Conv2dare correspondingly updated.
Insert and Fold prepacked ops (blocklisting option MobileOptimizerType::INSERT_FOLD_PREPACK_OPS): This optimization pass rewrites the graph to replace 2D convolutions and linear ops with their prepacked counterparts. Prepacked ops are stateful ops in that, they require some state to be created, such as weight prepacking and use this state, i.e. prepacked weights, during op execution. XNNPACK is one such backend that provides prepacked ops, with kernels optimized for mobile platforms (such as ARM CPUs). Prepacking of weight enables efficient memory access and thus faster kernel execution. At the moment
optimize_for_mobilepass rewrites the graph to replace
Conv2D/Linearwith 1) op that pre-packs weight for XNNPACK conv2d/linear ops and 2) op that takes pre-packed weight and activation as input and generates output activations. Since 1 needs to be done only once, we fold the weight pre-packing such that it is done only once at model load time. This pass of the
optimize_for_mobiledoes 1 and 2 and then folds, i.e. removes, weight pre-packing ops.
ReLU/Hardtanh fusion: XNNPACK ops support fusion of clamping. That is clamping of output activation is done as part of the kernel, including for 2D convolution and linear op kernels. Thus clamping effectively comes for free. Thus any op that can be expressed as clamping op, such as
hardtanh, can be fused with previous
linearop in XNNPACK. This pass rewrites graph by finding
ReLU/hardtanhops that follow XNNPACK
Conv2D/linearops, written by the previous pass, and fuses them together.
Dropout removal (blocklisting option MobileOptimizerType::REMOVE_DROPOUT): This optimization pass removes
dropout_nodes from this module when training is false.
Conv packed params hoisting (blocklisting option MobileOptimizerType::HOIST_CONV_PACKED_PARAMS): This optimization pass moves convolution packed params to the root module, so that the convolution structs can be deleted. This decreases model size without impacting numerics.
optimize_for_mobile will also invoke freeze_module pass which only preserves
forward method. If you have other method to that needed to be preserved, add them into the preserved method list and pass into the method.
optimize_for_mobile(script_module, optimization_blocklist=None, preserved_methods=None, backend='CPU', methods_to_optimize=None)¶
script_module – An instance of torch script module with type of ScriptModule.
optimization_blocklist – A set with type of MobileOptimizerType. When set is not passed, optimization method will run all the optimizer pass; otherwise, optimizer method will run the optimization pass that is not included inside optimization_blocklist.
preserved_methods – A list of methods that needed to be preserved when freeze_module pass is invoked
backend – Device type to use for running the result model (‘CPU’(default), ‘Vulkan’ or ‘Metal’).
methods_to_optimize – List of functions to optimize, CPU only, forward is optimized if it exists
A new optimized torch script module