Training the networks
The easiest way of training one or multiple neural networks is to use the scripts that are provided in /src.
Training one network
The code Training_single.py allows training only one network and playing with it. The following steps are performed.
After importing the libraries (see notebook), we load the data:
device = torch.device('cuda') # training on the GPU
# custom data loader, automatically sent to the GPU
ds = imelt.data_loader(device=device)
We select an architecture. For this example, we have selected the reference architecture from Le Losq et al. 2021:
nb_layers = 4
nb_neurons = 200
p_drop = 0.10 # we increased dropout here as this now works well with GELU units
If we want to save the model and figures in the directories ./model/candidates/ and ./figures/single/, we can use this code to check if the folders exist, and create them if not:
utils.create_dir('./model/candidates/')
utils.create_dir('./figures/single/')
Now we need a name for our model, we can generate it with the hyperparameters actually, this will help us having automatic names in case we try different architectures:
name = "./model/candidates/l"+str(nb_layers)+"_n"+str(nb_neurons)+"_p"+str(p_drop)+"_test"+".pth"
and we declare the model using imelt.model():
neuralmodel = imelt.model(ds.x_visco_train.shape[1],
hidden_size=nb_neurons,
num_layers=nb_layers,
nb_channels_raman=ds.nb_channels_raman,
activation_function = torch.nn.GELU(),
p_drop=p_drop)
We select a criterion for training (the MSE criterion from PyTorch) and send it to the GPU device
criterion = torch.nn.MSELoss(reduction='mean')
criterion.to(device) # sending criterion on device
Before training, we need to initilize the bias layer using the imelt function, and we send the network parameters to the GPU:
neuralmodel.output_bias_init()
neuralmodel = neuralmodel.float() # this is just to make sure we are using always float() numbers
neuralmodel.to(device)
Training will be done with the ADAM optimizer with a tuned learning rate of 0.0003:
optimizer = torch.optim.Adam(neuralmodel.parameters(), lr = 0.0003)
We have build a function for training in the imelt library that performs early stopping. You have to select:
the patience (how much epoch do you wait once you notice the validation error stop improving)
the min_delta variable, that represents the sensitivity to determine if the RMSE on the validation dataset really improved or not
The imelt.training() function outputs the trained model, and records of the training and validation losses during the epochs.
Training can thus be done with this code:
neuralmodel, record_train_loss, record_valid_loss = imelt.training(neuralmodel,ds,
criterion,optimizer,save_switch=True,save_name=name,
train_patience=250,min_delta=0.05,
verbose=True)
Hyperparameter tuning
RAY TUNE + OPTUNA
In the version 2.0, we rely on Ray Tune and Optuna to search for the best models.
The script ray_opt.py allows running a Ray Tune experiment.
The script ray_select.py allows selecting the best models based on posterior analysis of the Ray Tune experiment (all metrics recorded in an Excel spreadsheet that must be provided for model selection).
Bayesian optimization
CURRENTLY NOT WORKING
The bayesian_optim.py script allows performing Bayesian Optimization for hyperparameter selection using AX plateform.
Training candidates
Note : this was used in v1.2 for model selection, but now we rely on the Ray Tune + Optuna run to select models.
In any case, this still works. The code Training_Candidates.py allows training 100 networks with a given architecture and selects the 10 best ones, which are saved in ./model/best/ and used for future predictions.