Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate Blog Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for…ExecuTorch TeamMay 18, 2026
PyTorch 2.12 Release Blog Blog PyTorch 2.12 Release Blog We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12…PyTorch FoundationMay 13, 2026
Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs Blog Efficient Edge AI on Arm CPUs and NPUs: Understanding ExecuTorch through Practical Labs TL;DR: ExecuTorch extends the PyTorch ecosystem to deliver local AI inference on constrained edge devices.…Matt CossinsMay 12, 2026
In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference Blog In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference TL;DR: Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization…Jian Jiao, Boda Li, Hongtao Yu, Yuanwei (Kevin) Fang, Zhengkai Zhang, Zhuoran Zhao, Yuxin Chen, Sijia Chen†, Yang Chen†, Zijian Shen, Shuyao Bi, Ao Cai, Junhan Hu†, Shuqi Yang†, Wei Wei, Lu Fang, Rengan Xu, Manman Ren, Alex Zhong, Xiaohan Wei, Zeliang Chen, Ellie Wen, Wenlin ChenMay 5, 2026