PyTorch Design Philosophy
=========================

This document is designed to help contributors and module maintainers
understand the high-level design principles that have developed over
time in PyTorch. These are not meant to be hard-and-fast rules, but to
serve as a guide to help trade off different concerns and to resolve
disagreements that may come up while developing PyTorch. For more
information on contributing, module maintainership, and how to escalate a
disagreement to the Core Maintainers, please see `PyTorch
Governance <https://pytorch.org/docs/main/community/governance.html>`__.

Design Principles
-----------------

Principle 1: Usability over Performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This principle may be surprising! As one Hacker News poster wrote:
*PyTorch is amazing! [...] Although I’m confused. How can a ML framework be
not obsessed with speed/performance?* See `Hacker News discussion on
PyTorch <https://news.ycombinator.com/item?id=28066093>`__.

Soumith’s blog post on `Growing the PyTorch
Community <https://soumith.ch/posts/2021/02/growing-opensource/?fbclid=IwAR1bvN_xZ8avGvu14ODJzS8Zp7jX1BOyfuGUf-zoRawpyL-s95Vjxf88W7s>`__
goes into this in some depth, but at a high-level:

-  PyTorch’s primary goal is usability
-  A secondary goal is to have *reasonable* performance

We believe the ability to maintain our flexibility to support
researchers who are building on top of our abstractions remains
critical. We can’t see what the future of what workloads will be, but we
know we want them to be built first on PyTorch and that requires
flexibility.

In more concrete terms, we operate in a *usability-first* manner and try
to avoid jumping to *restriction-first* regimes (for example, static shapes,
graph-mode only) without a clear-eyed view of the tradeoffs. Often there
is a temptation to impose strict user restrictions upfront because it
can simplify implementation, but this comes with risks:

-  The performance may not be worth the user friction, either because
   the performance benefit is not compelling enough or it only applies to
   a relatively narrow set of subproblems.
-  Even if the performance benefit is compelling, the restrictions can
   fragment the ecosystem into different sets of limitations that can
   quickly become incomprehensible to users.

We want users to be able to seamlessly move their PyTorch code to
different hardware and software platforms, to interoperate with
different libraries and frameworks, and to experience the full richness
of the PyTorch user experience, not a least common denominator subset.

Principle 2: Simple Over Easy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, we borrow from `The Zen of
Python <https://peps.python.org/pep-0020/>`__:

-  *Explicit is better than implicit*
-  *Simple is better than complex*

A more concise way of describing these two goals is `Simple Over
Easy <https://www.infoq.com/presentations/Simple-Made-Easy/>`_. Let’s start with an example because *simple* and *easy* are
often used interchangeably in everyday English. Consider how one may
model `devices <https://pytorch.org/docs/main/tensor_attributes.html#torch.device>`__
in PyTorch:

-  **Simple / Explicit (to understand, debug):** every tensor is associated
   with a device. The user explicitly specifies tensor device movement.
   Operations that require cross-device movement result in an error.
-  **Easy / Implicit (to use):** the user does not have to worry about
   devices; the system figures out the globally optimal device
   placement.

In this specific case, and as a general design philosophy, PyTorch
favors exposing simple and explicit building blocks rather than APIs
that are easy-to-use by practitioners. The simple version is immediately
understandable and debuggable by a new PyTorch user: you get a clear
error if you call an operator requiring cross-device movement at the
point in the program where the operator is actually invoked. The easy
solution may let a new user move faster initially, but debugging such a
system can be complex: How did the system make its determination? What
is the API for plugging into such a system and how are objects
represented in its IR?

Some classic arguments in favor of this sort of design come from `A
Note on Distributed
Computation <https://dl.acm.org/doi/book/10.5555/974938>`__ (TLDR: Do not
model resources with very different performance characteristics
uniformly, the details will leak) and the `End-to-End
Principle <http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf>`__
(TLDR: building smarts into the lower-layers of the stack can prevent
building performant features at higher layers in the stack, and often
doesn’t work anyway). For example, we could build operator-level or
global device movement rules, but the precise choices aren’t obvious and
building an extensible mechanism has unavoidable complexity and latency
costs.

A caveat here is that this does not mean that higher-level “easy” APIs
are not valuable; certainly there is a value in, for example,
higher-levels in the stack to support efficient tensor computations
across heterogeneous compute in a large cluster. Instead, what we mean
is that focusing on simple lower-level building blocks helps inform the
easy API while still maintaining a good experience when users need to
leave the beaten path. It also allows space for innovation and the
growth of more opinionated tools at a rate we cannot support in the
PyTorch core library, but ultimately benefit from, as evidenced by
our `rich ecosystem <https://pytorch.org/ecosystem/>`__. In other
words, not automating at the start allows us to potentially reach levels
of good automation faster.

Principle 3: Python First with Best In Class Language Interoperability
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This principle began as **Python First**:

  PyTorch is not a Python binding into a monolithic C++ framework.
  It is built to be deeply integrated into Python. You can use it
  naturally like you would use `NumPy <https://www.numpy.org/>`__,
  `SciPy <https://www.scipy.org/>`__, `scikit-learn <https://scikit-learn.org/>`__,
  or other Python libraries. You can write your new neural network
  layers in Python itself, using your favorite libraries and use
  packages such as `Cython <https://cython.org/>`__ and
  `Numba <http://numba.pydata.org/>`__. Our goal is to not reinvent
  the wheel where appropriate.

One thing PyTorch has needed to deal with over the years is Python
overhead: we first rewrote the `autograd` engine in C++, then the majority
of operator definitions, then developed TorchScript and the C++
frontend.

Still, working in Python provides easily the best experience for our
users: it is flexible, familiar, and perhaps most importantly, has a
huge ecosystem of scientific computing libraries and extensions
available for use. This fact motivates a few of our most recent
contributions, which attempt to hit a Pareto optimal point close to the
Python usability end of the curve:

-  `TorchDynamo <https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361>`__,
   a Python frame evaluation tool capable of speeding up existing
   eager-mode PyTorch programs with minimal user intervention.
-  `torch_function <https://pytorch.org/docs/main/notes/extending.html#extending-torch>`__
   and `torch_dispatch <https://dev-discuss.pytorch.org/t/what-and-why-is-torch-dispatch/557>`__
   extension points, which have enabled Python-first functionality to be
   built on-top of C++ internals, such as the `torch.fx
   tracer <https://pytorch.org/docs/stable/fx.html>`__
   and `functorch <https://github.com/pytorch/functorch>`__
   respectively.

These design principles are not hard-and-fast rules, but hard won
choices and anchor how we built PyTorch to be the debuggable, hackable
and flexible framework it is today. As we have more contributors and
maintainers, we look forward to applying these core principles with you
across our libraries and ecosystem. We are also open to evolving them as
we learn new things and the AI space evolves, as we know it will.