Multiple Cameras¶
Stacked Observations¶
Using the custom_cnn
and mlp
architecture, it is possible to
pass pairs of images from different views stacked along the channels’
dimension i.e of dim (224,224,6).
To use this functionality to perform state representation learning,
enable --multi-view
(see usage of script train.py), and use a
dataset generated for the purpose.
Triplets of Observations¶
Similarly, it is possible to learn representation of states using a dataset of triplets, i.e tuples made of an anchor, a positive and a negative observation.
The anchor and the positive observation are views of the scene at the same time step, but from different cameras.
The negative example is an image from the same camera as the anchor but at a different time step selected randomly among images in the same record.
In our case, enable triplet
as a loss (--losses
) to use the
TCN-like architecture made of a pre-trained ResNet with an extra fully
connected layer (embedding).
To use this functionality also enable --multi-view
, and use a
dataset generated for the purpose. Related papers:
- “Time-Contrastive Networks: Self-Supervised Learning from Video” (P. Sermanet et al., 2017), paper: https://arxiv.org/abs/1704.06888