Learning a State Representation¶

To learn a state representation, you need to enforce constrains on the representation using one or more losses. For example, to train an autoencoder, you need to use a reconstruction loss. Most losses are not exclusive, that means you can combine them.

All losses are defined in losses/losses.py. The available losses are:

autoencoder: reconstruction loss, using current and next observation
denoising autoencoder (dae): same as for the auto-encoder, except that the model reconstruct inputs from noisy observations containing a random zero-pixel mask
vae: (beta)-VAE loss (reconstruction + kullback leiber divergence loss)
inverse: predict the action given current and next state
forward: predict the next state given current state and taken action
reward: predict the reward (positive or not) given current and next state
priors: robotic priors losses (see “Learning State Representations with Robotic Priors”)
triplet: triplet loss for multi-cam setting (see Multiple Cameras section)

[Experimental]

reward-prior: Maximises the correlation between states and rewards (does not make sense for sparse reward)
episode-prior: Learn an episode-agnostic state space, thanks to a discriminator distinguishing states from same/different episodes
perceptual similarity loss (for VAE): Instead of the reconstruction loss in the beta-VAE loss, it uses the distance between the reconstructed input and real input in the embedding of a pre-trained DAE.
mutual information loss: Maximises the mutual information between states and rewards

All possible arguments can be display using python train.py --help. You can limit the training set size (--training-set-size argument), change the minibatch size (-bs), number of epochs (--epochs), …

Examples¶

Train an inverse model:

python train.py --data-folder data/path/to/dataset --losses inverse

Train an autoencoder:

python train.py --data-folder data/path/to/dataset --losses autoencoder

Combining an autoencoder with an inverse model is as easy as:

python train.py --data-folder data/path/to/dataset --losses autoencoder inverse

You can as well specify the weight of each loss:

python train.py --data-folder data/path/to/dataset --losses autoencoder:1 inverse:10

Train a vae with the perceptual similarity loss:

python train.py --data-folder data/path/to/dataset --losses vae perceptual --path-to-dae logs/path/to/pretrained_dae/srl_model.pth --state-dim-dae ST_DIM_DAE

Stacking/Splitting Models Instead of Combining Them¶

Because losses do not optimize the same objective and can be opposed, it may make sense to stack representations learned with different objectives, instead of combining them. For instance, you can stack an autoencoder (with a state dimension of 20) with an inverse model (of dimension 2) using the previous weights:

python train.py --data-folder data/path/to/dataset --losses autoencoder:1:20 inverse:10:2 --state-dim 22

The details of how models are splitted can be found inside the SRLModulesSplit class, defined in models/modules.py. All models share the same encoder or features extractor, that maps observations to states.

Addtional example: split and combine losses. Reward loss on 50 dimensions and forward + inverse losses on 2 dimensions (note the -1 that specify that losses are applied on the same split):

python train.py --data-folder data/path/to/dataset --losses reward:1:50 inverse:1:2 forward:1:-1 --state-dim 52

Predicting States on the Whole Dataset¶

If you trained your model on a subset of a dataset, you can predict states for the whole dataset (or on a subset) using:

python -m evaluation.predict_dataset --log-dir logs/path/to/log_folder/

use -n 1000 to predict on the first 1000 samples only.

Predicting Reward Using a Trained Model¶

If you want to predict the reward (train a classifier for positive or null reward) using ground truth states or learned states, you can use evaluation/predict_reward.py script. Ground Truth:

python -m evaluation.predict_reward --data-folder data/dataset_name/ --training-set-size 50000

On Learned States:

python -m evaluation.predict_reward --data-folder data/dataset_name/ -i log/path/to/states_rewards.npz