LSTM training¶

In this tutorial, we train a recurrent neural network architecture (i.e., a stack of Bayesian LSTMs) on CDMs data, and use it for prediction purposes.

We assume that data have already been loaded (either from .kvn format, from pandas DataFrame object, or from the Kelvins challenge dataset: see the relevant tutorials) and stored into events.

from kessler import EventDataset
path_to_cdms_folder='/Users/giacomoacciarini/cdm_data/cdms_kvn/'
events=EventDataset(path_to_cdms_folder)

Loading CDMS (with extension .cdm.kvn.txt) from directory: /Users/giacomoacciarini/cdm_data/cdms_kvn/
Loaded 0 CDMs grouped into 0 events

We can then first define the features that have to be taken into account during training: this is a list of feature names. In this case, we can take all the features present on the uploaded data, provided that they have numeric content:

nn_features=events.common_features(only_numeric=True)

We can then split the data into test (here defined as 5% of the total number of events) and training & validation set:

len_test_set=int(0.5*len(events))
events_test=events[-len_test_set:]
events_train_and_val=events[:-len_test_set]

Finally, we create the LSTM predictor, by defining the LSTM hyperparameters as we wish:

from kessler.nn import LSTMPredictor
model = LSTMPredictor(
           lstm_size=256, #number of hidden units per LSTM layer
           lstm_depth=2,   #number of stacked LSTM layers
           dropout=0.2,   #dropout probability
           features=nn_features) #the list of feature names to use in the LSTM

/home/docs/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")

Then we start the training process:

model.learn(events_train_and_val,
           epochs=1, #number of epochs
           lr=1e-3, #learning rate (can decrease if training diverges)
           batch_size=16, #minibatch size (can decrease if there are memory issues)
           device='cpu', #can be 'cuda' if there is a GPU available
           valid_proportion=0.5, #proportion of data used as validation set
           num_workers=0, #number of multithreaded dataloader workers (usually 4 is good for performances, but if there are issues, try 1)
           event_samples_for_stats=3) #number of events to use to compute NN normalization factors

LSTM predictor with params: 790,528
Computing normalization statistics

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 model.learn(events_train_and_val,
      2            epochs=1, #number of epochs
      3            lr=1e-3, #learning rate (can decrease if training diverges)
      4            batch_size=16, #minibatch size (can decrease if there are memory issues)
      5            device='cpu', #can be 'cuda' if there is a GPU available
      6            valid_proportion=0.5, #proportion of data used as validation set
      7            num_workers=0, #number of multithreaded dataloader workers (usually 4 is good for performances, but if there are issues, try 1)
      8            event_samples_for_stats=3) #number of events to use to compute NN normalization factors

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/kessler/nn.py:165, in LSTMPredictor.learn(self, event_set, epochs, lr, batch_size, device, valid_proportion, num_workers, event_samples_for_stats, file_name_prefix)
    163 if self._features_stats is None:
    164     print('Computing normalization statistics')
--> 165     self._features_stats = DatasetEventDataset(event_set[:event_samples_for_stats], self._features)._features_stats
    167 self.to(device)
    168 optimizer = optim.Adam(self.parameters(), lr=lr)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/kessler/nn.py:27, in DatasetEventDataset.__init__(self, event_set, features, features_stats)
     25 def __init__(self, event_set, features, features_stats=None):
     26     self._event_set = event_set
---> 27     self._max_event_length = max(map(len, self._event_set))
     28     self._features = features
     29     self._features_length = len(features)

ValueError: max() arg is an empty sequence

Finally, we save the model to a file after training, and we plot the validation and training loss and save the image to a file:

model.save(file_name='LSTM_20epochs_lr1e-4_batchsize16')
model.plot_loss(file_name='plot_loss.pdf')

We now test the prediction. We take a single event, we remove the last CDM and try to predict it:

event=events_test[0]
event_len=len(event)
event_beginning=event[0:event_len-1]
event_evolution=model.predict_event(event_beginning, num_samples=100, max_length=14)
#we plot the prediction in red:
axs=event_evolution.plot_features(['RELATIVE_SPEED', 'MISS_DISTANCE'], return_axs=True, linewidth=0.1, color='red', alpha=0.33, label='Prediction')
#and the ground truth value in blue:
event.plot_features(['RELATIVE_SPEED', 'MISS_DISTANCE'], axs=axs, label='Real', legend=True)

Predicting event evolution
Time spent  | Time remain.| Progress             | Samples | Samples/sec
0d:00:00:08 | 0d:00:00:00 | #################### | 100/100 | 11.65