Let's say you want to test different hyperparameters on a given model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Conv2D, MaxPooling2D, Dropout, Flatten, Dense
)

def get_model(params):
  model = Sequential()
  model.add(Conv2D(params['conv_0'], kernel_size=params['kernel_0'],
                  activation='relu',
                  input_shape=params['input_shape']))
  model.add(Conv2D(params['conv_1'], params['kernel_1'], activation='relu'))
  model.add(MaxPooling2D(pool_size=params['pool_size']))
  model.add(Dropout(params['dropout_0']))
  model.add(Flatten())
  model.add(Dense(params['dense'], activation='relu'))
  model.add(Dropout(params['dropout_1']))
  model.add(Dense(params['num_classes'], activation='softmax'))
  return model

We then feed the get_model function with something like this:

'model': {
    'input_shape': (28, 28, 1),
    'conv_0': 32,
    'conv_1': 64,
    'kernel_0': (3,3),
    'kernel_1': (3,3),
    'pool_size': (2,2),
    'dropout_0': 0.25,
    'dense': 128,
    'dropout_1': 0.5,
    'num_classes': 10
}

The experiment manager, is here to automatically keep a tidy model checkpoints, performance files for your hyperparameter search. It's doing so by maintaining a consistent folder hierarchy based on the hash of your hyperparameters.

Let's see a real example, by defining our hyperparameters. We choose here (but you can do what you want) to separate them into two sections:

  • training which contains the parameters such as the batch size, the selected optimizer, ...
  • model which contains the parameters that actually build your model

Note that you can create as many section as you want and use as much nested dictionary as necessary.

params = {
    'debug': False,
    'training': {
      'batch_size': 128,
      'epochs': 3  
    },
    'model': {
      'input_shape': (28, 28, 1),
      'conv_0': 32,
      'conv_1': 64,
      'kernel_0': (3,3),
      'kernel_1': (3,3),
      'pool_size': (2,2),
      'dropout_0': 0.25,
      'dense': 128,
      'dropout_1': 0.5,
      'num_classes': 10
    },
    'comment': 'simple model from keras documentation',
    'author': 'data-soup'
}

#collapse
!sudo apt-get install tree # usefull later to display the directories

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# input image dimensions
img_rows, img_cols = params['model']['input_shape'][:2]
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (112 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package tree.
(Reading database ... 144579 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

#collapse
# Here lives the code for the experiement manager
!git clone https://github.com/maxpv/experiment_manager
Cloning into 'experiment_manager'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 24 (delta 7), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (24/24), done.

We can now prepare the ExperimentManager for a first test run.

  • exp_base_dir is the name of your experiment, can be the version of get_model or anything else
  • monitored_param_keys will manage your experiments based on those keys
from experiment_manager.experiment_manager import ExperimentManager
expm = ExperimentManager(exp_base_dir='experiments', 
      monitored_param_keys=['training', 'model'])
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks, 
          validation_data=(x_test, y_test))
Now everything is happening in /content/experiments/exp--72399506-44712951/run--20-09-04--16-11
Epoch 1/3
40/40 [==============================] - 17s 416ms/step - loss: 0.8887 - accuracy: 0.7264 - val_loss: 0.3282 - val_accuracy: 0.9063
Epoch 2/3
40/40 [==============================] - 17s 413ms/step - loss: 0.3505 - accuracy: 0.8956 - val_loss: 0.1944 - val_accuracy: 0.9412
Epoch 3/3
40/40 [==============================] - 17s 414ms/step - loss: 0.2214 - accuracy: 0.9364 - val_loss: 0.1378 - val_accuracy: 0.9564
<tensorflow.python.keras.callbacks.History at 0x7f9925c67780>

The training above generated:

  1. a tree structure for each experiment under a specific identifier and the current date
  2. callbacks for tf.keras to ensure that training logs and model checkpoints are written in the same directory
!tree experiments
experiments
└── exp--72399506-44712951
    └── run--20-09-04--16-11
        ├── hyperparameters.json
        ├── models
        │   ├── model.01-0.3282.hdf5
        │   ├── model.02-0.1944.hdf5
        │   └── model.03-0.1378.hdf5
        ├── performances.json
        └── training-logs.csv

3 directories, 6 files

Now let's launch another run using the same parameters.

#collapse
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks,
          validation_data=(x_test, y_test))
Now everything is happening in /content/experiments/exp--72399506-44712951/run--20-09-04--16-12
Epoch 1/3
40/40 [==============================] - 17s 417ms/step - loss: 0.8838 - accuracy: 0.7252 - val_loss: 0.3311 - val_accuracy: 0.8998
Epoch 2/3
40/40 [==============================] - 17s 413ms/step - loss: 0.3274 - accuracy: 0.9008 - val_loss: 0.1808 - val_accuracy: 0.9466
Epoch 3/3
40/40 [==============================] - 17s 414ms/step - loss: 0.1997 - accuracy: 0.9396 - val_loss: 0.1402 - val_accuracy: 0.9570
<tensorflow.python.keras.callbacks.History at 0x7f9922ce9fd0>
!tree experiments -d
experiments
└── exp--72399506-44712951
    ├── run--20-09-04--16-11
    │   └── models
    └── run--20-09-04--16-12
        └── models

5 directories

Let's see what happens if we change the model parameters:

#collapse
params['model']['conv_1'] = 32
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks,
          validation_data=(x_test, y_test))
Now everything is happening in /content/experiments/exp--72399506-20411437/run--20-09-04--16-13
Epoch 1/3
40/40 [==============================] - 11s 272ms/step - loss: 1.0456 - accuracy: 0.6842 - val_loss: 0.3508 - val_accuracy: 0.9021
Epoch 2/3
40/40 [==============================] - 11s 267ms/step - loss: 0.3753 - accuracy: 0.8862 - val_loss: 0.2358 - val_accuracy: 0.9300
Epoch 3/3
40/40 [==============================] - 11s 268ms/step - loss: 0.2736 - accuracy: 0.9168 - val_loss: 0.1867 - val_accuracy: 0.9423
<tensorflow.python.keras.callbacks.History at 0x7f9922b7d2e8>
!tree experiments -d
experiments
├── exp--72399506-20411437
│   └── run--20-09-04--16-13
│       └── models
└── exp--72399506-44712951
    ├── run--20-09-04--16-11
    │   └── models
    └── run--20-09-04--16-12
        └── models

8 directories