Let's say you want to test different hyperparameters on a given model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Conv2D, MaxPooling2D, Dropout, Flatten, Dense
)

def get_model(params):
  model = Sequential()
  model.add(Conv2D(params['conv_0'], kernel_size=params['kernel_0'],
                  activation='relu',
                  input_shape=params['input_shape']))
  model.add(Conv2D(params['conv_1'], params['kernel_1'], activation='relu'))
  model.add(MaxPooling2D(pool_size=params['pool_size']))
  model.add(Dropout(params['dropout_0']))
  model.add(Flatten())
  model.add(Dense(params['dense'], activation='relu'))
  model.add(Dropout(params['dropout_1']))
  model.add(Dense(params['num_classes'], activation='softmax'))
  return model

We then feed the get_model function with something like this:

'model': {
    'input_shape': (28, 28, 1),
    'conv_0': 32,
    'conv_1': 64,
    'kernel_0': (3,3),
    'kernel_1': (3,3),
    'pool_size': (2,2),
    'dropout_0': 0.25,
    'dense': 128,
    'dropout_1': 0.5,
    'num_classes': 10
}

The experiment manager, is here to automatically keep a tidy model checkpoints, performance files for your hyperparameter search. It's doing so by maintaining a consistent folder hierarchy based on the hash of your hyperparameters.

Let's see a real example, by defining our hyperparameters. We choose here (but you can do what you want) to separate them into two sections:

training which contains the parameters such as the batch size, the selected optimizer, ...
model which contains the parameters that actually build your model

Note that you can create as many section as you want and use as much nested dictionary as necessary.

params = {
    'debug': False,
    'training': {
      'batch_size': 128,
      'epochs': 3  
    },
    'model': {
      'input_shape': (28, 28, 1),
      'conv_0': 32,
      'conv_1': 64,
      'kernel_0': (3,3),
      'kernel_1': (3,3),
      'pool_size': (2,2),
      'dropout_0': 0.25,
      'dense': 128,
      'dropout_1': 0.5,
      'num_classes': 10
    },
    'comment': 'simple model from keras documentation',
    'author': 'data-soup'
}

#collapse
!sudo apt-get install tree # usefull later to display the directories

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# input image dimensions
img_rows, img_cols = params['model']['input_shape'][:2]
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (112 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package tree.
(Reading database ... 144579 files and directories currently installed.)
Preparing to unpack .../tree_1.7.0-5_amd64.deb ...
Unpacking tree (1.7.0-5) ...
Setting up tree (1.7.0-5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

#collapse
# Here lives the code for the experiement manager
!git clone https://github.com/maxpv/experiment_manager

Cloning into 'experiment_manager'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 24 (delta 7), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (24/24), done.

We can now prepare the ExperimentManager for a first test run.

exp_base_dir is the name of your experiment, can be the version of get_model or anything else
monitored_param_keys will manage your experiments based on those keys

from experiment_manager.experiment_manager import ExperimentManager
expm = ExperimentManager(exp_base_dir='experiments', 
      monitored_param_keys=['training', 'model'])
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks, 
          validation_data=(x_test, y_test))

Now everything is happening in /content/experiments/exp--72399506-44712951/run--20-09-04--16-11
Epoch 1/3
40/40 [==============================] - 17s 416ms/step - loss: 0.8887 - accuracy: 0.7264 - val_loss: 0.3282 - val_accuracy: 0.9063
Epoch 2/3
40/40 [==============================] - 17s 413ms/step - loss: 0.3505 - accuracy: 0.8956 - val_loss: 0.1944 - val_accuracy: 0.9412
Epoch 3/3
40/40 [==============================] - 17s 414ms/step - loss: 0.2214 - accuracy: 0.9364 - val_loss: 0.1378 - val_accuracy: 0.9564

<tensorflow.python.keras.callbacks.History at 0x7f9925c67780>

The training above generated:

a tree structure for each experiment under a specific identifier and the current date
callbacks for tf.keras to ensure that training logs and model checkpoints are written in the same directory

!tree experiments

experiments
└── exp--72399506-44712951
    └── run--20-09-04--16-11
        ├── hyperparameters.json
        ├── models
        │   ├── model.01-0.3282.hdf5
        │   ├── model.02-0.1944.hdf5
        │   └── model.03-0.1378.hdf5
        ├── performances.json
        └── training-logs.csv

3 directories, 6 files

Now let's launch another run using the same parameters.

#collapse
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks,
          validation_data=(x_test, y_test))

Now everything is happening in /content/experiments/exp--72399506-44712951/run--20-09-04--16-12
Epoch 1/3
40/40 [==============================] - 17s 417ms/step - loss: 0.8838 - accuracy: 0.7252 - val_loss: 0.3311 - val_accuracy: 0.8998
Epoch 2/3
40/40 [==============================] - 17s 413ms/step - loss: 0.3274 - accuracy: 0.9008 - val_loss: 0.1808 - val_accuracy: 0.9466
Epoch 3/3
40/40 [==============================] - 17s 414ms/step - loss: 0.1997 - accuracy: 0.9396 - val_loss: 0.1402 - val_accuracy: 0.9570

<tensorflow.python.keras.callbacks.History at 0x7f9922ce9fd0>

!tree experiments -d

experiments
└── exp--72399506-44712951
    ├── run--20-09-04--16-11
    │   └── models
    └── run--20-09-04--16-12
        └── models

5 directories

Let's see what happens if we change the model parameters:

#collapse
params['model']['conv_1'] = 32
callbacks = expm.prepare(params)

model = get_model(params['model'])
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(x_train[:5000], y_train[:5000],
          **params['training'],
          verbose=1,
          callbacks = callbacks,
          validation_data=(x_test, y_test))

Now everything is happening in /content/experiments/exp--72399506-20411437/run--20-09-04--16-13
Epoch 1/3
40/40 [==============================] - 11s 272ms/step - loss: 1.0456 - accuracy: 0.6842 - val_loss: 0.3508 - val_accuracy: 0.9021
Epoch 2/3
40/40 [==============================] - 11s 267ms/step - loss: 0.3753 - accuracy: 0.8862 - val_loss: 0.2358 - val_accuracy: 0.9300
Epoch 3/3
40/40 [==============================] - 11s 268ms/step - loss: 0.2736 - accuracy: 0.9168 - val_loss: 0.1867 - val_accuracy: 0.9423

<tensorflow.python.keras.callbacks.History at 0x7f9922b7d2e8>

!tree experiments -d

experiments
├── exp--72399506-20411437
│   └── run--20-09-04--16-13
│       └── models
└── exp--72399506-44712951
    ├── run--20-09-04--16-11
    │   └── models
    └── run--20-09-04--16-12
        └── models

8 directories