Shortcuts

⚠️ Notice: Limited Maintenance

This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.

Apple Silicon Support

What is supported

Experimental Support

  • For GPU jobs on Apple Silicon, MPS is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set deviceType: "cpu" in model-config.yaml.

    • This is an experimental feature and NOT ALL models are guaranteed to work.

  • Number of GPUs now reports GPUs on Apple Silicon

Testing

  • Pytests that checks for MPS on MacOS M1 devices

  • Models that have been tested and work: Resnet-18, Densenet161, Alexnet

  • Models that have been tested and DO NOT work: MNIST

Example Resnet-18 Using MPS On Mac M1 Pro

serve % torchserve --start --model-store model_store_gen --models resnet-18=resnet-18.mar --ncs

Torchserve version: 0.10.0
Number of GPUs: 16
Number of CPUs: 10
Max heap size: 8192 M
Python executable: /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store:
Initial Models: resnet-18=resnet-18.mar
Log dir:
Metrics dir:
Netty threads: 0
Netty client threads: 0
Default workers per model: 16
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store:
CPP log config: N/A
Model config: N/A
024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-04-08T14:18:02,391 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: resnet-18.mar
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet-18
2024-04-08T14:18:02,699 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet-18, count: 16
...
...
serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
...
{
  "tabby": 0.40966302156448364,
  "tiger_cat": 0.3467046618461609,
  "Egyptian_cat": 0.1300288736820221,
  "lynx": 0.02391958422958851,
  "bucket": 0.011532187461853027
}
...

Conda Example

(myenv) serve % pip list | grep torch
torch                     2.2.1
torchaudio                2.2.1
torchdata                 0.7.1
torchtext                 0.17.1
torchvision               0.17.1
(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
(myenv3) serve % pip list | grep torch
torch                     2.2.1
torch-model-archiver      0.10.0b20240312
torch-workflow-archiver   0.2.12b20240312
torchaudio                2.2.1
torchdata                 0.7.1
torchserve                0.10.0b20240312
torchtext                 0.17.1
torchvision               0.17.1
(myenv3) serve % torchserve --start --ncs  --models densenet161.mar --model-store ./model_store_gen/
Torchserve version: 0.10.0
Number of GPUs: 0
Number of CPUs: 10
Max heap size: 8192 M
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Initial Models: densenet161.mar
Netty threads: 0
Netty client threads: 0
Default workers per model: 10
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
CPP log config: N/A
Model config: N/A
System metrics command: default
...
2024-03-12T15:58:54,702 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model densenet161 loaded.
2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
Model server started.
...
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
{
  "tabby": 0.46661922335624695,
  "tiger_cat": 0.46449029445648193,
  "Egyptian_cat": 0.0661405548453331,
  "lynx": 0.001292439759708941,
  "plastic_bag": 0.00022909720428287983
}

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources