Note
Click here to download the full example code
Hyperparameter tuning with Ray Tune¶
Hyperparameter tuning can make the difference between an average model and a highly accurate one. Often simple things like choosing a different learning rate or changing a network layer size can have a dramatic impact on your model performance.
Fortunately, there are tools that help with finding the best combination of parameters. Ray Tune is an industry standard tool for distributed hyperparameter tuning. Ray Tune includes the latest hyperparameter search algorithms, integrates with TensorBoard and other analysis libraries, and natively supports distributed training through Ray’s distributed machine learning engine.
In this tutorial, we will show you how to integrate Ray Tune into your PyTorch training workflow. We will extend this tutorial from the PyTorch documentation for training a CIFAR10 image classifier.
As you will see, we only need to add some slight modifications. In particular, we need to
- wrap data loading and training in functions,
- make some network parameters configurable,
- add checkpointing (optional),
- and define the search space for the model tuning
To run this tutorial, please make sure the following packages are installed:
ray[tune]
: Distributed hyperparameter tuning librarytorchvision
: For the data transformers
Setup / Imports¶
Let’s start with the imports:
from functools import partial
import numpy as np
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler
Most of the imports are needed for building the PyTorch model. Only the last three imports are for Ray Tune.
Data loaders¶
We wrap the data loaders in their own function and pass a global data directory. This way we can share a data directory between different trials.
def load_data(data_dir="./data"):
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
trainset = torchvision.datasets.CIFAR10(
root=data_dir, train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(
root=data_dir, train=False, download=True, transform=transform)
return trainset, testset
Configurable neural network¶
We can only tune those parameters that are configurable. In this example, we can specify the layer sizes of the fully connected layers:
class Net(nn.Module):
def __init__(self, l1=120, l2=84):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
The train function¶
Now it gets interesting, because we introduce some changes to the example from the PyTorch documentation.
We wrap the training script in a function train_cifar(config, checkpoint_dir=None, data_dir=None)
.
As you can guess, the config
parameter will receive the hyperparameters we would like to
train with. The checkpoint_dir
parameter is used to restore checkpoints. The data_dir
specifies
the directory where we load and store the data, so multiple runs can share the same data source.
net = Net(config["l1"], config["l2"])
if checkpoint_dir:
model_state, optimizer_state = torch.load(
os.path.join(checkpoint_dir, "checkpoint"))
net.load_state_dict(model_state)
optimizer.load_state_dict(optimizer_state)
The learning rate of the optimizer is made configurable, too:
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
We also split the training data into a training and validation subset. We thus train on 80% of the data and calculate the validation loss on the remaining 20%. The batch sizes with which we iterate through the training and test sets are configurable as well.
Adding (multi) GPU support with DataParallel¶
Image classification benefits largely from GPUs. Luckily, we can continue to use
PyTorch’s abstractions in Ray Tune. Thus, we can wrap our model in nn.DataParallel
to support data parallel training on multiple GPUs:
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
net.to(device)
By using a device
variable we make sure that training also works when we have
no GPUs available. PyTorch requires us to send our data to the GPU memory explicitly,
like this:
for i, data in enumerate(trainloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
The code now supports training on CPUs, on a single GPU, and on multiple GPUs. Notably, Ray also supports fractional GPUs so we can share GPUs among trials, as long as the model still fits on the GPU memory. We’ll come back to that later.
Communicating with Ray Tune¶
The most interesting part is the communication with Ray Tune:
with tune.checkpoint_dir(epoch) as checkpoint_dir:
path = os.path.join(checkpoint_dir, "checkpoint")
torch.save((net.state_dict(), optimizer.state_dict()), path)
tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
Here we first save a checkpoint and then report some metrics back to Ray Tune. Specifically, we send the validation loss and accuracy back to Ray Tune. Ray Tune can then use these metrics to decide which hyperparameter configuration lead to the best results. These metrics can also be used to stop bad performing trials early in order to avoid wasting resources on those trials.
The checkpoint saving is optional, however, it is necessary if we wanted to use advanced schedulers like Population Based Training. Also, by saving the checkpoint we can later load the trained models and validate them on a test set.
Full training function¶
The full code example looks like this:
def train_cifar(config, checkpoint_dir=None, data_dir=None):
net = Net(config["l1"], config["l2"])
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
if checkpoint_dir:
model_state, optimizer_state = torch.load(
os.path.join(checkpoint_dir, "checkpoint"))
net.load_state_dict(model_state)
optimizer.load_state_dict(optimizer_state)
trainset, testset = load_data(data_dir)
test_abs = int(len(trainset) * 0.8)
train_subset, val_subset = random_split(
trainset, [test_abs, len(trainset) - test_abs])
trainloader = torch.utils.data.DataLoader(
train_subset,
batch_size=int(config["batch_size"]),
shuffle=True,
num_workers=8)
valloader = torch.utils.data.DataLoader(
val_subset,
batch_size=int(config["batch_size"]),
shuffle=True,
num_workers=8)
for epoch in range(10): # loop over the dataset multiple times
running_loss = 0.0
epoch_steps = 0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
epoch_steps += 1
if i % 2000 == 1999: # print every 2000 mini-batches
print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,
running_loss / epoch_steps))
running_loss = 0.0
# Validation loss
val_loss = 0.0
val_steps = 0
total = 0
correct = 0
for i, data in enumerate(valloader, 0):
with torch.no_grad():
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
loss = criterion(outputs, labels)
val_loss += loss.cpu().numpy()
val_steps += 1
with tune.checkpoint_dir(epoch) as checkpoint_dir:
path = os.path.join(checkpoint_dir, "checkpoint")
torch.save((net.state_dict(), optimizer.state_dict()), path)
tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
print("Finished Training")
As you can see, most of the code is adapted directly from the original example.
Test set accuracy¶
Commonly the performance of a machine learning model is tested on a hold-out test set with data that has not been used for training the model. We also wrap this in a function:
def test_accuracy(net, device="cpu"):
trainset, testset = load_data()
testloader = torch.utils.data.DataLoader(
testset, batch_size=4, shuffle=False, num_workers=2)
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
return correct / total
The function also expects a device
parameter, so we can do the
test set validation on a GPU.
Configuring the search space¶
Lastly, we need to define Ray Tune’s search space. Here is an example:
config = {
"l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
"l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16])
}
The tune.sample_from()
function makes it possible to define your own sample
methods to obtain hyperparameters. In this example, the l1
and l2
parameters
should be powers of 2 between 4 and 256, so either 4, 8, 16, 32, 64, 128, or 256.
The lr
(learning rate) should be uniformly sampled between 0.0001 and 0.1. Lastly,
the batch size is a choice between 2, 4, 8, and 16.
At each trial, Ray Tune will now randomly sample a combination of parameters from these
search spaces. It will then train a number of models in parallel and find the best
performing one among these. We also use the ASHAScheduler
which will terminate bad
performing trials early.
We wrap the train_cifar
function with functools.partial
to set the constant
data_dir
parameter. We can also tell Ray Tune what resources should be
available for each trial:
gpus_per_trial = 2
# ...
result = tune.run(
partial(train_cifar, data_dir=data_dir),
resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},
config=config,
num_samples=num_samples,
scheduler=scheduler,
progress_reporter=reporter,
checkpoint_at_end=True)
You can specify the number of CPUs, which are then available e.g.
to increase the num_workers
of the PyTorch DataLoader
instances. The selected
number of GPUs are made visible to PyTorch in each trial. Trials do not have access to
GPUs that haven’t been requested for them - so you don’t have to care about two trials
using the same set of resources.
Here we can also specify fractional GPUs, so something like gpus_per_trial=0.5
is
completely valid. The trials will then share GPUs among each other.
You just have to make sure that the models still fit in the GPU memory.
After training the models, we will find the best performing one and load the trained network from the checkpoint file. We then obtain the test set accuracy and report everything by printing.
The full main function looks like this:
def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
data_dir = os.path.abspath("./data")
load_data(data_dir)
config = {
"l1": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),
"l2": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16])
}
scheduler = ASHAScheduler(
metric="loss",
mode="min",
max_t=max_num_epochs,
grace_period=1,
reduction_factor=2)
reporter = CLIReporter(
# parameter_columns=["l1", "l2", "lr", "batch_size"],
metric_columns=["loss", "accuracy", "training_iteration"])
result = tune.run(
partial(train_cifar, data_dir=data_dir),
resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},
config=config,
num_samples=num_samples,
scheduler=scheduler,
progress_reporter=reporter)
best_trial = result.get_best_trial("loss", "min", "last")
print("Best trial config: {}".format(best_trial.config))
print("Best trial final validation loss: {}".format(
best_trial.last_result["loss"]))
print("Best trial final validation accuracy: {}".format(
best_trial.last_result["accuracy"]))
best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if gpus_per_trial > 1:
best_trained_model = nn.DataParallel(best_trained_model)
best_trained_model.to(device)
best_checkpoint_dir = best_trial.checkpoint.value
model_state, optimizer_state = torch.load(os.path.join(
best_checkpoint_dir, "checkpoint"))
best_trained_model.load_state_dict(model_state)
test_acc = test_accuracy(best_trained_model, device)
print("Best trial test set accuracy: {}".format(test_acc))
if __name__ == "__main__":
# You can change the number of GPUs per trial here:
main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
Out:
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz
Extracting /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data
Files already downloaded and verified
== Status ==
Memory usage on this node: 4.1/240.1 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None
Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 1/10 (1 RUNNING)
+---------------------+----------+-------+--------------+------+------+------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr |
|---------------------+----------+-------+--------------+------+------+------------|
| DEFAULT_da74d_00000 | RUNNING | | 8 | 64 | 256 | 0.00646798 |
+---------------------+----------+-------+--------------+------+------+------------+
[2m[36m(pid=1419)[0m Files already downloaded and verified
[2m[36m(pid=1446)[0m Files already downloaded and verified
[2m[36m(pid=1422)[0m Files already downloaded and verified
[2m[36m(pid=1421)[0m Files already downloaded and verified
[2m[36m(pid=1454)[0m Files already downloaded and verified
[2m[36m(pid=1420)[0m Files already downloaded and verified
[2m[36m(pid=1447)[0m Files already downloaded and verified
[2m[36m(pid=1457)[0m Files already downloaded and verified
[2m[36m(pid=1430)[0m Files already downloaded and verified
[2m[36m(pid=1415)[0m Files already downloaded and verified
[2m[36m(pid=1419)[0m Files already downloaded and verified
[2m[36m(pid=1446)[0m Files already downloaded and verified
[2m[36m(pid=1422)[0m Files already downloaded and verified
[2m[36m(pid=1421)[0m Files already downloaded and verified
[2m[36m(pid=1454)[0m Files already downloaded and verified
[2m[36m(pid=1420)[0m Files already downloaded and verified
[2m[36m(pid=1447)[0m Files already downloaded and verified
[2m[36m(pid=1457)[0m Files already downloaded and verified
[2m[36m(pid=1430)[0m Files already downloaded and verified
[2m[36m(pid=1415)[0m Files already downloaded and verified
[2m[36m(pid=1454)[0m [1, 2000] loss: 2.262
[2m[36m(pid=1421)[0m [1, 2000] loss: 2.196
[2m[36m(pid=1430)[0m [1, 2000] loss: 2.322
[2m[36m(pid=1447)[0m [1, 2000] loss: 2.348
[2m[36m(pid=1457)[0m [1, 2000] loss: 2.150
[2m[36m(pid=1419)[0m [1, 2000] loss: 1.875
[2m[36m(pid=1420)[0m [1, 2000] loss: 1.951
[2m[36m(pid=1422)[0m [1, 2000] loss: 2.102
[2m[36m(pid=1415)[0m [1, 2000] loss: 2.255
[2m[36m(pid=1446)[0m [1, 2000] loss: 2.311
[2m[36m(pid=1454)[0m [1, 4000] loss: 1.039
[2m[36m(pid=1421)[0m [1, 4000] loss: 1.006
[2m[36m(pid=1430)[0m [1, 4000] loss: 1.154
Result for DEFAULT_da74d_00001:
accuracy: 0.1503
date: 2021-02-26_20-26-35
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 2.2887174449920655
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 27.85807466506958
time_this_iter_s: 27.85807466506958
time_total_s: 27.85807466506958
timestamp: 1614371195
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00001
== Status ==
Memory usage on this node: 9.2/240.1 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -2.2887174449920655
Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (10 RUNNING)
+---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | | 8 | 64 | 256 | 0.00646798 | | | |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 2.28872 | 0.1503 | 1 |
| DEFAULT_da74d_00002 | RUNNING | | 4 | 64 | 64 | 0.0142423 | | | |
| DEFAULT_da74d_00003 | RUNNING | | 8 | 8 | 256 | 0.0150371 | | | |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | | 8 | 8 | 128 | 0.0108518 | | | |
| DEFAULT_da74d_00006 | RUNNING | | 4 | 4 | 4 | 0.079216 | | | |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | | 8 | 32 | 8 | 0.000820594 | | | |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
+---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1447)[0m [1, 4000] loss: 1.174
[2m[36m(pid=1457)[0m [1, 4000] loss: 1.063
[2m[36m(pid=1420)[0m [1, 4000] loss: 0.898
[2m[36m(pid=1422)[0m [1, 4000] loss: 1.011
[2m[36m(pid=1415)[0m [1, 4000] loss: 0.948
[2m[36m(pid=1419)[0m [1, 4000] loss: 0.802
[2m[36m(pid=1454)[0m [1, 6000] loss: 0.661
[2m[36m(pid=1430)[0m [1, 6000] loss: 0.768
[2m[36m(pid=1421)[0m [1, 6000] loss: 0.634
[2m[36m(pid=1447)[0m [1, 6000] loss: 0.782
[2m[36m(pid=1457)[0m [1, 6000] loss: 0.717
Result for DEFAULT_da74d_00005:
accuracy: 0.3615
date: 2021-02-26_20-26-49
done: false
experiment_id: dd72d2f0d9784a71972ef69ab60db084
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 1.7052976821899415
node_ip: 172.17.0.2
pid: 1420
should_checkpoint: true
time_since_restore: 42.42658591270447
time_this_iter_s: 42.42658591270447
time_total_s: 42.42658591270447
timestamp: 1614371209
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00005
== Status ==
Memory usage on this node: 9.3/240.1 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -1.9970075635910036
Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (10 RUNNING)
+---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | | 8 | 64 | 256 | 0.00646798 | | | |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 2.28872 | 0.1503 | 1 |
| DEFAULT_da74d_00002 | RUNNING | | 4 | 64 | 64 | 0.0142423 | | | |
| DEFAULT_da74d_00003 | RUNNING | | 8 | 8 | 256 | 0.0150371 | | | |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.7053 | 0.3615 | 1 |
| DEFAULT_da74d_00006 | RUNNING | | 4 | 4 | 4 | 0.079216 | | | |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | | 8 | 32 | 8 | 0.000820594 | | | |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
+---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00003:
accuracy: 0.1835
date: 2021-02-26_20-26-50
done: true
experiment_id: d32ef5791fc148498c5005dd2f09277a
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 2.072287780284882
node_ip: 172.17.0.2
pid: 1422
should_checkpoint: true
time_since_restore: 42.821146965026855
time_this_iter_s: 42.821146965026855
time_total_s: 42.821146965026855
timestamp: 1614371210
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00003
Result for DEFAULT_da74d_00008:
accuracy: 0.3868
date: 2021-02-26_20-26-50
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 1.6864227166652679
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 43.11535668373108
time_this_iter_s: 43.11535668373108
time_total_s: 43.11535668373108
timestamp: 1614371210
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00008
Result for DEFAULT_da74d_00000:
accuracy: 0.4418
date: 2021-02-26_20-26-50
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 1.525510474061966
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 43.45280838012695
time_this_iter_s: 43.45280838012695
time_total_s: 43.45280838012695
timestamp: 1614371210
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00000
[2m[36m(pid=1446)[0m [2, 2000] loss: 2.236
[2m[36m(pid=1454)[0m [1, 8000] loss: 0.482
[2m[36m(pid=1421)[0m [1, 8000] loss: 0.445
[2m[36m(pid=1430)[0m [1, 8000] loss: 0.576
Result for DEFAULT_da74d_00001:
accuracy: 0.2261
date: 2021-02-26_20-26-59
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 2.092621375656128
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 52.07927751541138
time_this_iter_s: 24.221202850341797
time_total_s: 52.07927751541138
timestamp: 1614371219
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00001
== Status ==
Memory usage on this node: 8.8/240.1 GiB
Using AsyncHyperBand: num_stopped=1
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -2.092621375656128 | Iter 1.000: -1.7052976821899415
Resources requested: 18/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (9 RUNNING, 1 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.52551 | 0.4418 | 1 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 2.09262 | 0.2261 | 2 |
| DEFAULT_da74d_00002 | RUNNING | | 4 | 64 | 64 | 0.0142423 | | | |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.7053 | 0.3615 | 1 |
| DEFAULT_da74d_00006 | RUNNING | | 4 | 4 | 4 | 0.079216 | | | |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.68642 | 0.3868 | 1 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1447)[0m [1, 8000] loss: 0.588
[2m[36m(pid=1457)[0m [1, 8000] loss: 0.529
[2m[36m(pid=1420)[0m [2, 2000] loss: 1.770
[2m[36m(pid=1415)[0m [2, 2000] loss: 1.658
[2m[36m(pid=1419)[0m [2, 2000] loss: 1.489
[2m[36m(pid=1454)[0m [1, 10000] loss: 0.375
[2m[36m(pid=1421)[0m [1, 10000] loss: 0.346
[2m[36m(pid=1430)[0m [1, 10000] loss: 0.460
[2m[36m(pid=1447)[0m [1, 10000] loss: 0.470
[2m[36m(pid=1457)[0m [1, 10000] loss: 0.428
[2m[36m(pid=1446)[0m [3, 2000] loss: 1.989
[2m[36m(pid=1420)[0m [2, 4000] loss: 0.887
[2m[36m(pid=1454)[0m [1, 12000] loss: 0.314
[2m[36m(pid=1415)[0m [2, 4000] loss: 0.781
[2m[36m(pid=1421)[0m [1, 12000] loss: 0.280
[2m[36m(pid=1430)[0m [1, 12000] loss: 0.384
[2m[36m(pid=1419)[0m [2, 4000] loss: 0.743
Result for DEFAULT_da74d_00006:
accuracy: 0.1005
date: 2021-02-26_20-27-19
done: true
experiment_id: e40403f57743415d86dda9607f2449c6
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 2.3291412625312806
node_ip: 172.17.0.2
pid: 1447
should_checkpoint: true
time_since_restore: 71.89647650718689
time_this_iter_s: 71.89647650718689
time_total_s: 71.89647650718689
timestamp: 1614371239
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00006
== Status ==
Memory usage on this node: 8.9/240.1 GiB
Using AsyncHyperBand: num_stopped=2
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -2.092621375656128 | Iter 1.000: -1.8887927312374115
Resources requested: 18/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (9 RUNNING, 1 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.52551 | 0.4418 | 1 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 2.09262 | 0.2261 | 2 |
| DEFAULT_da74d_00002 | RUNNING | | 4 | 64 | 64 | 0.0142423 | | | |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.7053 | 0.3615 | 1 |
| DEFAULT_da74d_00006 | RUNNING | 172.17.0.2:1447 | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.68642 | 0.3868 | 1 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00002:
accuracy: 0.1267
date: 2021-02-26_20-27-21
done: true
experiment_id: a5b519cc84fb46ba9f4299788360832b
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 2.1816043051719665
node_ip: 172.17.0.2
pid: 1457
should_checkpoint: true
time_since_restore: 74.21312308311462
time_this_iter_s: 74.21312308311462
time_total_s: 74.21312308311462
timestamp: 1614371241
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00002
Result for DEFAULT_da74d_00001:
accuracy: 0.3239
date: 2021-02-26_20-27-22
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 3
loss: 1.8586337614059447
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 75.3482575416565
time_this_iter_s: 23.268980026245117
time_total_s: 75.3482575416565
timestamp: 1614371242
timesteps_since_restore: 0
training_iteration: 3
trial_id: da74d_00001
[2m[36m(pid=1454)[0m [1, 14000] loss: 0.264
Result for DEFAULT_da74d_00005:
accuracy: 0.3543
date: 2021-02-26_20-27-26
done: false
experiment_id: dd72d2f0d9784a71972ef69ab60db084
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 1.7419664831638335
node_ip: 172.17.0.2
pid: 1420
should_checkpoint: true
time_since_restore: 79.20708870887756
time_this_iter_s: 36.780502796173096
time_total_s: 79.20708870887756
timestamp: 1614371246
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00005
== Status ==
Memory usage on this node: 7.8/240.1 GiB
Using AsyncHyperBand: num_stopped=3
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9172939294099807 | Iter 1.000: -2.072287780284882
Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (7 RUNNING, 3 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.52551 | 0.4418 | 1 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.85863 | 0.3239 | 3 |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.74197 | 0.3543 | 2 |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.68642 | 0.3868 | 1 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1421)[0m [1, 14000] loss: 0.238
Result for DEFAULT_da74d_00008:
accuracy: 0.4698
date: 2021-02-26_20-27-27
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 1.459486995410919
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 80.27314257621765
time_this_iter_s: 37.15778589248657
time_total_s: 80.27314257621765
timestamp: 1614371247
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00008
[2m[36m(pid=1430)[0m [1, 14000] loss: 0.328
Result for DEFAULT_da74d_00000:
accuracy: 0.4625
date: 2021-02-26_20-27-28
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 1.4884922024250031
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 80.99731063842773
time_this_iter_s: 37.54450225830078
time_total_s: 80.99731063842773
timestamp: 1614371248
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00000
[2m[36m(pid=1454)[0m [1, 16000] loss: 0.230
[2m[36m(pid=1421)[0m [1, 16000] loss: 0.204
[2m[36m(pid=1430)[0m [1, 16000] loss: 0.287
[2m[36m(pid=1446)[0m [4, 2000] loss: 1.780
[2m[36m(pid=1420)[0m [3, 2000] loss: 1.779
[2m[36m(pid=1415)[0m [3, 2000] loss: 1.464
[2m[36m(pid=1419)[0m [3, 2000] loss: 1.416
Result for DEFAULT_da74d_00001:
accuracy: 0.3797
date: 2021-02-26_20-27-44
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 4
loss: 1.670368455696106
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 97.65018343925476
time_this_iter_s: 22.301925897598267
time_total_s: 97.65018343925476
timestamp: 1614371264
timesteps_since_restore: 0
training_iteration: 4
trial_id: da74d_00001
== Status ==
Memory usage on this node: 7.8/240.1 GiB
Using AsyncHyperBand: num_stopped=3
Bracket: Iter 8.000: None | Iter 4.000: -1.670368455696106 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -2.072287780284882
Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (7 RUNNING, 3 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.48849 | 0.4625 | 2 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.67037 | 0.3797 | 4 |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.74197 | 0.3543 | 2 |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.45949 | 0.4698 | 2 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1454)[0m [1, 18000] loss: 0.207
[2m[36m(pid=1421)[0m [1, 18000] loss: 0.182
[2m[36m(pid=1430)[0m [1, 18000] loss: 0.253
[2m[36m(pid=1420)[0m [3, 4000] loss: 0.898
[2m[36m(pid=1415)[0m [3, 4000] loss: 0.707
[2m[36m(pid=1419)[0m [3, 4000] loss: 0.716
[2m[36m(pid=1454)[0m [1, 20000] loss: 0.187
[2m[36m(pid=1421)[0m [1, 20000] loss: 0.163
[2m[36m(pid=1430)[0m [1, 20000] loss: 0.226
[2m[36m(pid=1446)[0m [5, 2000] loss: 1.639
Result for DEFAULT_da74d_00005:
accuracy: 0.3168
date: 2021-02-26_20-28-01
done: false
experiment_id: dd72d2f0d9784a71972ef69ab60db084
hostname: 0b92e671318e
iterations_since_restore: 3
loss: 1.8536528338909148
node_ip: 172.17.0.2
pid: 1420
should_checkpoint: true
time_since_restore: 114.06767654418945
time_this_iter_s: 34.86058783531189
time_total_s: 114.06767654418945
timestamp: 1614371281
timesteps_since_restore: 0
training_iteration: 3
trial_id: da74d_00005
== Status ==
Memory usage on this node: 7.9/240.1 GiB
Using AsyncHyperBand: num_stopped=3
Bracket: Iter 8.000: None | Iter 4.000: -1.670368455696106 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -2.072287780284882
Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (7 RUNNING, 3 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.48849 | 0.4625 | 2 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.67037 | 0.3797 | 4 |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.85365 | 0.3168 | 3 |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.45949 | 0.4698 | 2 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00008:
accuracy: 0.474
date: 2021-02-26_20-28-03
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 3
loss: 1.4891104881763457
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 115.78469705581665
time_this_iter_s: 35.511554479599
time_total_s: 115.78469705581665
timestamp: 1614371283
timesteps_since_restore: 0
training_iteration: 3
trial_id: da74d_00008
Result for DEFAULT_da74d_00000:
accuracy: 0.4856
date: 2021-02-26_20-28-04
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 3
loss: 1.4677546185016632
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 117.28643274307251
time_this_iter_s: 36.289122104644775
time_total_s: 117.28643274307251
timestamp: 1614371284
timesteps_since_restore: 0
training_iteration: 3
trial_id: da74d_00000
Result for DEFAULT_da74d_00001:
accuracy: 0.4147
date: 2021-02-26_20-28-07
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 5
loss: 1.5820930083274842
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 120.34646391868591
time_this_iter_s: 22.696280479431152
time_total_s: 120.34646391868591
timestamp: 1614371287
timesteps_since_restore: 0
training_iteration: 5
trial_id: da74d_00001
== Status ==
Memory usage on this node: 7.9/240.1 GiB
Using AsyncHyperBand: num_stopped=3
Bracket: Iter 8.000: None | Iter 4.000: -1.670368455696106 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -2.072287780284882
Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (7 RUNNING, 3 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.46775 | 0.4856 | 3 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.58209 | 0.4147 | 5 |
| DEFAULT_da74d_00004 | RUNNING | | 2 | 4 | 4 | 0.000303051 | | | |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.85365 | 0.3168 | 3 |
| DEFAULT_da74d_00007 | RUNNING | | 2 | 4 | 32 | 0.00104518 | | | |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.48911 | 0.474 | 3 |
| DEFAULT_da74d_00009 | RUNNING | | 2 | 8 | 32 | 0.00276505 | | | |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00009:
accuracy: 0.2768
date: 2021-02-26_20-28-09
done: false
experiment_id: 7a996ae85ee84cd5befcf6252e7ef6d4
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 1.9227685967683792
node_ip: 172.17.0.2
pid: 1454
should_checkpoint: true
time_since_restore: 122.37326312065125
time_this_iter_s: 122.37326312065125
time_total_s: 122.37326312065125
timestamp: 1614371289
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00009
Result for DEFAULT_da74d_00007:
accuracy: 0.3979
date: 2021-02-26_20-28-10
done: false
experiment_id: f38b32d4499a4f59836f13bb7b7095f7
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 1.6090776248946785
node_ip: 172.17.0.2
pid: 1421
should_checkpoint: true
time_since_restore: 123.21496987342834
time_this_iter_s: 123.21496987342834
time_total_s: 123.21496987342834
timestamp: 1614371290
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00007
Result for DEFAULT_da74d_00004:
accuracy: 0.1734
date: 2021-02-26_20-28-11
done: true
experiment_id: c3beafe7c9434d4490237fa65cfe1f11
hostname: 0b92e671318e
iterations_since_restore: 1
loss: 2.2460573588371275
node_ip: 172.17.0.2
pid: 1430
should_checkpoint: true
time_since_restore: 123.91765284538269
time_this_iter_s: 123.91765284538269
time_total_s: 123.91765284538269
timestamp: 1614371291
timesteps_since_restore: 0
training_iteration: 1
trial_id: da74d_00004
[2m[36m(pid=1420)[0m [4, 2000] loss: 1.790
[2m[36m(pid=1415)[0m [4, 2000] loss: 1.344
[2m[36m(pid=1419)[0m [4, 2000] loss: 1.366
[2m[36m(pid=1454)[0m [2, 2000] loss: 1.819
[2m[36m(pid=1421)[0m [2, 2000] loss: 1.600
[2m[36m(pid=1446)[0m [6, 2000] loss: 1.548
[2m[36m(pid=1420)[0m [4, 4000] loss: 0.927
[2m[36m(pid=1415)[0m [4, 4000] loss: 0.668
[2m[36m(pid=1454)[0m [2, 4000] loss: 0.930
Result for DEFAULT_da74d_00001:
accuracy: 0.4353
date: 2021-02-26_20-28-29
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 6
loss: 1.5407315127372743
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 142.0170876979828
time_this_iter_s: 21.670623779296875
time_total_s: 142.0170876979828
timestamp: 1614371309
timesteps_since_restore: 0
training_iteration: 6
trial_id: da74d_00001
== Status ==
Memory usage on this node: 7.4/240.1 GiB
Using AsyncHyperBand: num_stopped=4
Bracket: Iter 8.000: None | Iter 4.000: -1.670368455696106 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (6 RUNNING, 4 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.46775 | 0.4856 | 3 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.54073 | 0.4353 | 6 |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.85365 | 0.3168 | 3 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.48911 | 0.474 | 3 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1419)[0m [4, 4000] loss: 0.700
[2m[36m(pid=1421)[0m [2, 4000] loss: 0.781
Result for DEFAULT_da74d_00005:
accuracy: 0.2839
date: 2021-02-26_20-28-35
done: true
experiment_id: dd72d2f0d9784a71972ef69ab60db084
hostname: 0b92e671318e
iterations_since_restore: 4
loss: 1.9311893384933472
node_ip: 172.17.0.2
pid: 1420
should_checkpoint: true
time_since_restore: 148.27433562278748
time_this_iter_s: 34.20665907859802
time_total_s: 148.27433562278748
timestamp: 1614371315
timesteps_since_restore: 0
training_iteration: 4
trial_id: da74d_00005
== Status ==
Memory usage on this node: 7.4/240.1 GiB
Using AsyncHyperBand: num_stopped=5
Bracket: Iter 8.000: None | Iter 4.000: -1.8007788970947267 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (6 RUNNING, 4 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.46775 | 0.4856 | 3 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.54073 | 0.4353 | 6 |
| DEFAULT_da74d_00005 | RUNNING | 172.17.0.2:1420 | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.48911 | 0.474 | 3 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00008:
accuracy: 0.5218
date: 2021-02-26_20-28-38
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 4
loss: 1.327530821633339
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 150.68803215026855
time_this_iter_s: 34.903335094451904
time_total_s: 150.68803215026855
timestamp: 1614371318
timesteps_since_restore: 0
training_iteration: 4
trial_id: da74d_00008
[2m[36m(pid=1454)[0m [2, 6000] loss: 0.603
[2m[36m(pid=1421)[0m [2, 6000] loss: 0.518
Result for DEFAULT_da74d_00000:
accuracy: 0.5012
date: 2021-02-26_20-28-39
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 4
loss: 1.4394630181789398
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 152.61839723587036
time_this_iter_s: 35.33196449279785
time_total_s: 152.61839723587036
timestamp: 1614371319
timesteps_since_restore: 0
training_iteration: 4
trial_id: da74d_00000
[2m[36m(pid=1446)[0m [7, 2000] loss: 1.488
[2m[36m(pid=1454)[0m [2, 8000] loss: 0.459
[2m[36m(pid=1421)[0m [2, 8000] loss: 0.394
[2m[36m(pid=1415)[0m [5, 2000] loss: 1.277
Result for DEFAULT_da74d_00001:
accuracy: 0.4645
date: 2021-02-26_20-28-50
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 7
loss: 1.4647596702575683
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 163.39900302886963
time_this_iter_s: 21.38191533088684
time_total_s: 163.39900302886963
timestamp: 1614371330
timesteps_since_restore: 0
training_iteration: 7
trial_id: da74d_00001
== Status ==
Memory usage on this node: 6.8/240.1 GiB
Using AsyncHyperBand: num_stopped=5
Bracket: Iter 8.000: None | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.43946 | 0.5012 | 4 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.46476 | 0.4645 | 7 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.32753 | 0.5218 | 4 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1419)[0m [5, 2000] loss: 1.325
[2m[36m(pid=1454)[0m [2, 10000] loss: 0.363
[2m[36m(pid=1421)[0m [2, 10000] loss: 0.313
[2m[36m(pid=1415)[0m [5, 4000] loss: 0.625
[2m[36m(pid=1419)[0m [5, 4000] loss: 0.689
[2m[36m(pid=1446)[0m [8, 2000] loss: 1.437
[2m[36m(pid=1454)[0m [2, 12000] loss: 0.297
[2m[36m(pid=1421)[0m [2, 12000] loss: 0.256
Result for DEFAULT_da74d_00008:
accuracy: 0.5202
date: 2021-02-26_20-29-11
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 5
loss: 1.3261634290218354
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 183.65447759628296
time_this_iter_s: 32.966445446014404
time_total_s: 183.65447759628296
timestamp: 1614371351
timesteps_since_restore: 0
training_iteration: 5
trial_id: da74d_00008
== Status ==
Memory usage on this node: 6.9/240.1 GiB
Using AsyncHyperBand: num_stopped=5
Bracket: Iter 8.000: None | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.43946 | 0.5012 | 4 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.46476 | 0.4645 | 7 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.32616 | 0.5202 | 5 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00001:
accuracy: 0.4764
date: 2021-02-26_20-29-11
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 8
loss: 1.446412706375122
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 184.39706802368164
time_this_iter_s: 20.99806499481201
time_total_s: 184.39706802368164
timestamp: 1614371351
timesteps_since_restore: 0
training_iteration: 8
trial_id: da74d_00001
Result for DEFAULT_da74d_00000:
accuracy: 0.4975
date: 2021-02-26_20-29-13
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 5
loss: 1.439678263449669
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 186.38049840927124
time_this_iter_s: 33.76210117340088
time_total_s: 186.38049840927124
timestamp: 1614371353
timesteps_since_restore: 0
training_iteration: 5
trial_id: da74d_00000
[2m[36m(pid=1454)[0m [2, 14000] loss: 0.258
[2m[36m(pid=1421)[0m [2, 14000] loss: 0.216
[2m[36m(pid=1415)[0m [6, 2000] loss: 1.212
[2m[36m(pid=1454)[0m [2, 16000] loss: 0.220
[2m[36m(pid=1421)[0m [2, 16000] loss: 0.192
[2m[36m(pid=1419)[0m [6, 2000] loss: 1.329
[2m[36m(pid=1446)[0m [9, 2000] loss: 1.391
Result for DEFAULT_da74d_00001:
accuracy: 0.5038
date: 2021-02-26_20-29-32
done: false
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 9
loss: 1.3644779193878174
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 205.46153283119202
time_this_iter_s: 21.064464807510376
time_total_s: 205.46153283119202
timestamp: 1614371372
timesteps_since_restore: 0
training_iteration: 9
trial_id: da74d_00001
== Status ==
Memory usage on this node: 7.0/240.1 GiB
Using AsyncHyperBand: num_stopped=5
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.43968 | 0.4975 | 5 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.36448 | 0.5038 | 9 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.32616 | 0.5202 | 5 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1454)[0m [2, 18000] loss: 0.201
[2m[36m(pid=1415)[0m [6, 4000] loss: 0.603
[2m[36m(pid=1421)[0m [2, 18000] loss: 0.169
[2m[36m(pid=1419)[0m [6, 4000] loss: 0.680
[2m[36m(pid=1454)[0m [2, 20000] loss: 0.182
[2m[36m(pid=1421)[0m [2, 20000] loss: 0.154
Result for DEFAULT_da74d_00008:
accuracy: 0.5508
date: 2021-02-26_20-29-44
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 6
loss: 1.2812463031530381
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 216.8842477798462
time_this_iter_s: 33.22977018356323
time_total_s: 216.8842477798462
timestamp: 1614371384
timesteps_since_restore: 0
training_iteration: 6
trial_id: da74d_00008
== Status ==
Memory usage on this node: 7.0/240.1 GiB
Using AsyncHyperBand: num_stopped=5
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.43968 | 0.4975 | 5 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.36448 | 0.5038 | 9 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.28125 | 0.5508 | 6 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1446)[0m [10, 2000] loss: 1.346
Result for DEFAULT_da74d_00000:
accuracy: 0.5164
date: 2021-02-26_20-29-47
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 6
loss: 1.4101861970424652
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 220.41678524017334
time_this_iter_s: 34.0362868309021
time_total_s: 220.41678524017334
timestamp: 1614371387
timesteps_since_restore: 0
training_iteration: 6
trial_id: da74d_00000
Result for DEFAULT_da74d_00001:
accuracy: 0.5062
date: 2021-02-26_20-29-54
done: true
experiment_id: a7b8b29cfdab4bb6a59bdaf6229ce657
hostname: 0b92e671318e
iterations_since_restore: 10
loss: 1.3854370698928833
node_ip: 172.17.0.2
pid: 1446
should_checkpoint: true
time_since_restore: 226.80386519432068
time_this_iter_s: 21.342332363128662
time_total_s: 226.80386519432068
timestamp: 1614371394
timesteps_since_restore: 0
training_iteration: 10
trial_id: da74d_00001
== Status ==
Memory usage on this node: 7.0/240.1 GiB
Using AsyncHyperBand: num_stopped=6
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.6152293427944184 | Iter 1.000: -1.9975281885266305
Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (5 RUNNING, 5 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.41019 | 0.5164 | 6 |
| DEFAULT_da74d_00001 | RUNNING | 172.17.0.2:1446 | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.60908 | 0.3979 | 1 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.28125 | 0.5508 | 6 |
| DEFAULT_da74d_00009 | RUNNING | 172.17.0.2:1454 | 2 | 8 | 32 | 0.00276505 | 1.92277 | 0.2768 | 1 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00009:
accuracy: 0.334
date: 2021-02-26_20-29-54
done: true
experiment_id: 7a996ae85ee84cd5befcf6252e7ef6d4
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 1.8365726371228694
node_ip: 172.17.0.2
pid: 1454
should_checkpoint: true
time_since_restore: 227.46239614486694
time_this_iter_s: 105.0891330242157
time_total_s: 227.46239614486694
timestamp: 1614371394
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00009
[2m[36m(pid=1415)[0m [7, 2000] loss: 1.155
Result for DEFAULT_da74d_00007:
accuracy: 0.4327
date: 2021-02-26_20-29-56
done: false
experiment_id: f38b32d4499a4f59836f13bb7b7095f7
hostname: 0b92e671318e
iterations_since_restore: 2
loss: 1.5651366052120923
node_ip: 172.17.0.2
pid: 1421
should_checkpoint: true
time_since_restore: 229.14502954483032
time_this_iter_s: 105.93005967140198
time_total_s: 229.14502954483032
timestamp: 1614371396
timesteps_since_restore: 0
training_iteration: 2
trial_id: da74d_00007
[2m[36m(pid=1419)[0m [7, 2000] loss: 1.312
[2m[36m(pid=1421)[0m [3, 2000] loss: 1.504
[2m[36m(pid=1415)[0m [7, 4000] loss: 0.582
[2m[36m(pid=1419)[0m [7, 4000] loss: 0.673
[2m[36m(pid=1421)[0m [3, 4000] loss: 0.756
Result for DEFAULT_da74d_00008:
accuracy: 0.573
date: 2021-02-26_20-30-16
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 7
loss: 1.1890086390733718
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 249.36491537094116
time_this_iter_s: 32.48066759109497
time_total_s: 249.36491537094116
timestamp: 1614371416
timesteps_since_restore: 0
training_iteration: 7
trial_id: da74d_00008
== Status ==
Memory usage on this node: 5.6/240.1 GiB
Using AsyncHyperBand: num_stopped=7
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (3 RUNNING, 7 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.41019 | 0.5164 | 6 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.56514 | 0.4327 | 2 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.18901 | 0.573 | 7 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
Result for DEFAULT_da74d_00000:
accuracy: 0.4835
date: 2021-02-26_20-30-20
done: false
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 7
loss: 1.4933614113330842
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 253.44921016693115
time_this_iter_s: 33.03242492675781
time_total_s: 253.44921016693115
timestamp: 1614371420
timesteps_since_restore: 0
training_iteration: 7
trial_id: da74d_00000
[2m[36m(pid=1421)[0m [3, 6000] loss: 0.500
[2m[36m(pid=1415)[0m [8, 2000] loss: 1.120
[2m[36m(pid=1421)[0m [3, 8000] loss: 0.377
[2m[36m(pid=1419)[0m [8, 2000] loss: 1.301
[2m[36m(pid=1415)[0m [8, 4000] loss: 0.561
[2m[36m(pid=1421)[0m [3, 10000] loss: 0.300
[2m[36m(pid=1419)[0m [8, 4000] loss: 0.683
Result for DEFAULT_da74d_00008:
accuracy: 0.5901
date: 2021-02-26_20-30-48
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 8
loss: 1.1499544531583785
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 281.1171724796295
time_this_iter_s: 31.752257108688354
time_total_s: 281.1171724796295
timestamp: 1614371448
timesteps_since_restore: 0
training_iteration: 8
trial_id: da74d_00008
== Status ==
Memory usage on this node: 5.8/240.1 GiB
Using AsyncHyperBand: num_stopped=7
Bracket: Iter 8.000: -1.2981835797667503 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (3 RUNNING, 7 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | RUNNING | 172.17.0.2:1419 | 8 | 64 | 256 | 0.00646798 | 1.49336 | 0.4835 | 7 |
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.56514 | 0.4327 | 2 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.14995 | 0.5901 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1421)[0m [3, 12000] loss: 0.248
Result for DEFAULT_da74d_00000:
accuracy: 0.492
date: 2021-02-26_20-30-53
done: true
experiment_id: ea85a355d5fb414dbc129bd43bcddaca
hostname: 0b92e671318e
iterations_since_restore: 8
loss: 1.4674230321645736
node_ip: 172.17.0.2
pid: 1419
should_checkpoint: true
time_since_restore: 286.12872838974
time_this_iter_s: 32.67951822280884
time_total_s: 286.12872838974
timestamp: 1614371453
timesteps_since_restore: 0
training_iteration: 8
trial_id: da74d_00000
[2m[36m(pid=1421)[0m [3, 14000] loss: 0.212
[2m[36m(pid=1415)[0m [9, 2000] loss: 1.077
[2m[36m(pid=1421)[0m [3, 16000] loss: 0.189
[2m[36m(pid=1415)[0m [9, 4000] loss: 0.554
[2m[36m(pid=1421)[0m [3, 18000] loss: 0.168
Result for DEFAULT_da74d_00008:
accuracy: 0.5745
date: 2021-02-26_20-31-19
done: false
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 9
loss: 1.2056537937998772
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 312.5512430667877
time_this_iter_s: 31.434070587158203
time_total_s: 312.5512430667877
timestamp: 1614371479
timesteps_since_restore: 0
training_iteration: 9
trial_id: da74d_00008
== Status ==
Memory usage on this node: 5.0/240.1 GiB
Using AsyncHyperBand: num_stopped=8
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (2 RUNNING, 8 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.56514 | 0.4327 | 2 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.20565 | 0.5745 | 9 |
| DEFAULT_da74d_00000 | TERMINATED | | 8 | 64 | 256 | 0.00646798 | 1.46742 | 0.492 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1421)[0m [3, 20000] loss: 0.150
[2m[36m(pid=1415)[0m [10, 2000] loss: 1.066
Result for DEFAULT_da74d_00007:
accuracy: 0.4558
date: 2021-02-26_20-31-35
done: false
experiment_id: f38b32d4499a4f59836f13bb7b7095f7
hostname: 0b92e671318e
iterations_since_restore: 3
loss: 1.4805301667168735
node_ip: 172.17.0.2
pid: 1421
should_checkpoint: true
time_since_restore: 327.98270630836487
time_this_iter_s: 98.83767676353455
time_total_s: 327.98270630836487
timestamp: 1614371495
timesteps_since_restore: 0
training_iteration: 3
trial_id: da74d_00007
== Status ==
Memory usage on this node: 5.1/240.1 GiB
Using AsyncHyperBand: num_stopped=8
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (2 RUNNING, 8 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.48053 | 0.4558 | 3 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.20565 | 0.5745 | 9 |
| DEFAULT_da74d_00000 | TERMINATED | | 8 | 64 | 256 | 0.00646798 | 1.46742 | 0.492 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1415)[0m [10, 4000] loss: 0.539
[2m[36m(pid=1421)[0m [4, 2000] loss: 1.462
Result for DEFAULT_da74d_00008:
accuracy: 0.5935
date: 2021-02-26_20-31-51
done: true
experiment_id: 647989d51d014d30b9e72cb9c54a1dc2
hostname: 0b92e671318e
iterations_since_restore: 10
loss: 1.1467542746901511
node_ip: 172.17.0.2
pid: 1415
should_checkpoint: true
time_since_restore: 343.9505739212036
time_this_iter_s: 31.399330854415894
time_total_s: 343.9505739212036
timestamp: 1614371511
timesteps_since_restore: 0
training_iteration: 10
trial_id: da74d_00008
== Status ==
Memory usage on this node: 5.1/240.1 GiB
Using AsyncHyperBand: num_stopped=9
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5549157369375228 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (2 RUNNING, 8 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.48053 | 0.4558 | 3 |
| DEFAULT_da74d_00008 | RUNNING | 172.17.0.2:1415 | 8 | 32 | 8 | 0.000820594 | 1.14675 | 0.5935 | 10 |
| DEFAULT_da74d_00000 | TERMINATED | | 8 | 64 | 256 | 0.00646798 | 1.46742 | 0.492 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
[2m[36m(pid=1421)[0m [4, 4000] loss: 0.728
[2m[36m(pid=1421)[0m [4, 6000] loss: 0.493
[2m[36m(pid=1421)[0m [4, 8000] loss: 0.375
[2m[36m(pid=1421)[0m [4, 10000] loss: 0.298
[2m[36m(pid=1421)[0m [4, 12000] loss: 0.250
[2m[36m(pid=1421)[0m [4, 14000] loss: 0.206
[2m[36m(pid=1421)[0m [4, 16000] loss: 0.186
[2m[36m(pid=1421)[0m [4, 18000] loss: 0.166
[2m[36m(pid=1421)[0m [4, 20000] loss: 0.148
Result for DEFAULT_da74d_00007:
accuracy: 0.4495
date: 2021-02-26_20-33-11
done: true
experiment_id: f38b32d4499a4f59836f13bb7b7095f7
hostname: 0b92e671318e
iterations_since_restore: 4
loss: 1.5724148665212094
node_ip: 172.17.0.2
pid: 1421
should_checkpoint: true
time_since_restore: 424.4040172100067
time_this_iter_s: 96.42131090164185
time_total_s: 424.4040172100067
timestamp: 1614371591
timesteps_since_restore: 0
training_iteration: 4
trial_id: da74d_00007
== Status ==
Memory usage on this node: 4.3/240.1 GiB
Using AsyncHyperBand: num_stopped=10
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5724148665212094 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00007 | RUNNING | 172.17.0.2:1421 | 2 | 4 | 32 | 0.00104518 | 1.57241 | 0.4495 | 4 |
| DEFAULT_da74d_00000 | TERMINATED | | 8 | 64 | 256 | 0.00646798 | 1.46742 | 0.492 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00008 | TERMINATED | | 8 | 32 | 8 | 0.000820594 | 1.14675 | 0.5935 | 10 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+
== Status ==
Memory usage on this node: 4.1/240.1 GiB
Using AsyncHyperBand: num_stopped=10
Bracket: Iter 8.000: -1.446412706375122 | Iter 4.000: -1.5724148665212094 | Iter 2.000: -1.653551544187963 | Iter 1.000: -1.9975281885266305
Resources requested: 0/32 CPUs, 0/2 GPUs, 0.0/157.81 GiB heap, 0.0/49.41 GiB objects (0/1.0 accelerator_type:M60)
Result logdir: /var/lib/jenkins/ray_results/DEFAULT_2021-02-26_20-26-06
Number of trials: 10/10 (10 TERMINATED)
+---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+
| Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration |
|---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------|
| DEFAULT_da74d_00000 | TERMINATED | | 8 | 64 | 256 | 0.00646798 | 1.46742 | 0.492 | 8 |
| DEFAULT_da74d_00001 | TERMINATED | | 16 | 64 | 8 | 0.000360487 | 1.38544 | 0.5062 | 10 |
| DEFAULT_da74d_00002 | TERMINATED | | 4 | 64 | 64 | 0.0142423 | 2.1816 | 0.1267 | 1 |
| DEFAULT_da74d_00003 | TERMINATED | | 8 | 8 | 256 | 0.0150371 | 2.07229 | 0.1835 | 1 |
| DEFAULT_da74d_00004 | TERMINATED | | 2 | 4 | 4 | 0.000303051 | 2.24606 | 0.1734 | 1 |
| DEFAULT_da74d_00005 | TERMINATED | | 8 | 8 | 128 | 0.0108518 | 1.93119 | 0.2839 | 4 |
| DEFAULT_da74d_00006 | TERMINATED | | 4 | 4 | 4 | 0.079216 | 2.32914 | 0.1005 | 1 |
| DEFAULT_da74d_00007 | TERMINATED | | 2 | 4 | 32 | 0.00104518 | 1.57241 | 0.4495 | 4 |
| DEFAULT_da74d_00008 | TERMINATED | | 8 | 32 | 8 | 0.000820594 | 1.14675 | 0.5935 | 10 |
| DEFAULT_da74d_00009 | TERMINATED | | 2 | 8 | 32 | 0.00276505 | 1.83657 | 0.334 | 2 |
+---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+
Best trial config: {'l1': 32, 'l2': 8, 'lr': 0.000820593994019208, 'batch_size': 8}
Best trial final validation loss: 1.1467542746901511
Best trial final validation accuracy: 0.5935
Files already downloaded and verified
Files already downloaded and verified
Best trial test set accuracy: 0.5993
If you run the code, an example output could look like this:
Most trials have been stopped early in order to avoid wasting resources. The best performing trial achieved a validation accuracy of about 58%, which could be confirmed on the test set.
So that’s it! You can now tune the parameters of your PyTorch models.
Total running time of the script: ( 7 minutes 29.065 seconds)