Quick start
First steps
Go to the otto subdirectory.
You will see that it is organized in three main directories corresponding to the three main uses of OTTO:
evaluate: for evaluating the performance of a policy
learn: for learning a neural network policy that solves the task
visualize: for visualizing a search episode
The other directory, classes, contains all the class definitions used by the main scripts.
The three main directories share the same structure. They contain:
*.py: the main script
parameters: the directory to store input parameters
outputs: the directory to store files generated by the script (it will be created on first use)
To use OTTO, go to the relevant main directory and run the corresponding script.
For example, to visualize an episode, go to the visualize directory and run the visualize.py script with:
python3 visualize.py
You should now see the rendering of a 1D search in a new window (it may be very short!). You can visualize another episode by using again the same command.
Some logging information is displayed in the terminal as the script runs. In the rendering window, the first panel is a map of odor detections, and the second panel is the agent’s belief (probability distribution over source locations).
The videos have been saved as visualize/outputs/YYmmdd-HHMMSS_video.mp4 where ‘YYmmdd-HHMMSS’ is a
timestamp (the time you started the script).
If you do not have FFmpeg or if you are using Windows, you will find instead frames saved
in visualize/outputs/YYmmdd-HHMMSS_frames.
Changing parameters
Many parameters (space dimension, domain size, source intensity, policy, …) can be changed from the defaults.
To run a script with different parameters, create a Python script that sets your parameters in the
parameters directory.
A file myparam.py is already present in visualize/parameters/ for this example.
It contains a single line:
N_DIMS = 2
which sets the dimensionality of the search to 2D (1D is the default).
User-defined parameters are called by using the --input option followed by the name of the parameter file.
For example, you can now visualize a search in 2D with:
python3 visualize.py --input myparam.py
The --input option can be shortened to -i, and the file name can be with or without .py. So the command:
python3 visualize.py -i myparam
will have the same effect.
Each parameters directory contain sample parameter files called example*.py.
They show essential parameters you can play with, for example:
N_DIMSsets the dimensionality of the search (1D, 2D, 3D), default isN_DIMS = 1
LAMBDA_OVER_DXcontrols the size of the domain, default isLAMBDA_OVER_DX = 2.0
R_DTcontrols the source intensity, default isR_DT = 2.0
POLICYdefines the policy to use, default isPOLICY = 0(infotaxis)
Note: the actual size of the computational domain, called N_GRID, is determined internally based
on N_DIMS, LAMBDA_OVER_DX and R_DT to make the domain “large enough” for the boundaries
to have (almost) no effect on the search. As a rule of thumb, N_GRID ≈ 15 LAMBDA_OVER_DX.
The definition of all parameters is provided here,
and you can find their default values by examining the contents of __defaults.py.
Evaluating a policy
The evaluate.py script (in the evaluate directory) computes many statistics that characterize the performance
of a policy, such as
probability of never finding the source,
average time to find the source,
probability distribution of arrival times,
and much more.
It does so essentially by running thousands of episodes in parallel and averaging over those.
You can try with:
python3 evaluate.py
This will take some time (order of magnitude is 2 minutes on 8 cores). Logging information is displayed in the terminal while the episodes are running.
Windows users: if a NameError is raised, see known issues.
Once the script has completed, you can look at the results in the directory evaluate/outputs/YYmmdd-HHMMSS
where ‘YYmmdd-HHMMSS’ is the time you started the script.
Ymmdd-HHMMSS_figure_distributions.pdf is a figure summarizing the results.
All output files are described here.
These results are for the “infotaxis” policy, which is the default policy. You can now try to compute the statistics of another policy on the same problem. For example, evaluate the “space-aware infotaxis” policy by running:
python3 evaluate.py --input myparam.py
where myparam.py is a file containing the line:
POLICY = 1
This file is already present in evaluate/parameters/ for this example.
The main policies are
POLICY = 0for infotaxis (default)
POLICY = 1for space-aware infotaxis, a recently proposed heuristic that beats infotaxis in most cases
POLICY = -1for a reinforcement learning policy: for that we need to learn first!
All policies are described here.
Learning a policy
The learn.py script learns a policy using deep reinforcement learning.
It actually trains a neural network model of the optimal value function.
The (approximately) optimal policy is then derived from this function.
To train a model, go to the learn directory and use:
python3 learn.py
Now is the perfect time for a coffee since it will take quite a while. Logging information is displayed in the terminal while the script runs (if the script seems to have frozen, see known issues).
When you come back, you can look at the contents of the learn/outputs/YYmmdd-HHMMSS directory.
There should be a figure called YYmmdd-HHMMSS_figure_learning_progress.png (if not you need a larger coffee).
This figure shows the progress of the learning agent and is periodically updated as the training progresses. In particular, it shows the evolution of ‘p_not_found’, the probability that the source is never found, and of ‘mean’, the mean time to find the source provided it is ever found (if p_not_found is large, the mean is meaningless).
Other outputs are described here.
Completing the training may take up to roughly 5000-10000 iterations (several hours on an average laptop), but progress should be clearly visible from 500-1000 iterations. For reference, the optimal policy yields p_not_found < 1e-6 and mean ~ 7.15.
Training will continue until 10000 iterations, but can be stopped at any time.
Models are saved in the learn/models/YYmmdd-HHMMSS directory:
YYmmdd-HHMMSS_modelis the most recent model,
YYmmdd-HHMMSS_model_bkp_i, where i is an integer, are the models saved at evaluation points (the models which performance is shown inYYmmdd-HHMMSS_figure_learning_progress.png).
Note: training can restart from a previously saved model.
Visualizing and evaluating a learned policy
Once a neural network model is trained, the corresponding policy can be evaluated or visualized by running the
main scripts with a parameter file (using --input) containing:
POLICY = -1
MODEL_PATH = "../learn/models/YYmmdd-HHMMSS/YYmmdd-HHMMSS_model_bkp_i"
where MODEL_PATH is the path to the neural network model.
Important: parameters should be consistent. For example, if you set N_DIMS = 2 for learning then you must also
set N_DIMS = 2 for evaluation and visualization.
Trained neural networks
A collection of trained neural networks is provided in the zoo directory accessible from the root of the package.
They are saved in the models directory and corresponding parameter files are in the parameters directory.
They are named zoo_model_i_j_k where i, j, k are integers associated to N_DIMS, LAMBDA_OVER_DX, R_DT.
The list of all trained neural networks is available here.
To visualize the policy associated to the neural network model zoo_model_1_2_2, use:
python3 visualize.py --input zoo_model_1_2_2
Similarly you can evaluate this neural network policy with:
python3 evaluate.py --input zoo_model_1_2_2
Custom policies
You want to try your own policy?
Policies are implemented in classes/heuristicpolicies.
You can define your own in the function _custom_policy.
To use it in the main scripts, set POLICY = 2 in your parameter file.
To facilitate the evaluation of new policies compared to existing baselines, the performances of several policies (infotaxis, space-aware infotaxis and near-optimal) are reported in a dataset.
Cleaning up
The directories can be restored to their original state by running the cleanall.sh bash script located
at the root of the package.
Warning: all user-generated outputs and models will be deleted!