Quick start

First steps

Go to the otto subdirectory.

You will see that it is organized in three main directories corresponding to the three main uses of OTTO:

  • evaluate: for evaluating the performance of a policy

  • learn: for learning a neural network policy that solves the task

  • visualize: for visualizing a search episode

The other directory, classes, contains all the class definitions used by the main scripts.

The three main directories share the same structure. They contain:

  • *.py: the main script

  • parameters: the directory to store input parameters

  • outputs: the directory to store files generated by the script (it will be created on first use)

To use OTTO, go to the relevant main directory and run the corresponding script. For example, to visualize an episode, go to the visualize directory and run the visualize.py script with:

python3 visualize.py

You should now see the rendering of a 1D search in a new window (it may be very short!). You can visualize another episode by using again the same command.

Some logging information is displayed in the terminal as the script runs. In the rendering window, the first panel is a map of odor detections, and the second panel is the agent’s belief (probability distribution over source locations).

The videos have been saved as visualize/outputs/YYmmdd-HHMMSS_video.mp4 where ‘YYmmdd-HHMMSS’ is a timestamp (the time you started the script).

If you do not have FFmpeg or if you are using Windows, you will find instead frames saved in visualize/outputs/YYmmdd-HHMMSS_frames.

Changing parameters

Many parameters (space dimension, domain size, source intensity, policy, …) can be changed from the defaults. To run a script with different parameters, create a Python script that sets your parameters in the parameters directory.

A file myparam.py is already present in visualize/parameters/ for this example. It contains a single line:

N_DIMS = 2

which sets the dimensionality of the search to 2D (1D is the default).

User-defined parameters are called by using the --input option followed by the name of the parameter file. For example, you can now visualize a search in 2D with:

python3 visualize.py --input myparam.py

The --input option can be shortened to -i, and the file name can be with or without .py. So the command:

python3 visualize.py -i myparam

will have the same effect.

Each parameters directory contain sample parameter files called example*.py. They show essential parameters you can play with, for example:

  • N_DIMS sets the dimensionality of the search (1D, 2D, 3D), default is N_DIMS = 1

  • LAMBDA_OVER_DX controls the size of the domain, default is LAMBDA_OVER_DX = 2.0

  • R_DT controls the source intensity, default is R_DT = 2.0

  • POLICY defines the policy to use, default is POLICY = 0 (infotaxis)

Note: the actual size of the computational domain, called N_GRID, is determined internally based on N_DIMS, LAMBDA_OVER_DX and R_DT to make the domain “large enough” for the boundaries to have (almost) no effect on the search. As a rule of thumb, N_GRID 15 LAMBDA_OVER_DX.

The definition of all parameters is provided here, and you can find their default values by examining the contents of __defaults.py.

Evaluating a policy

The evaluate.py script (in the evaluate directory) computes many statistics that characterize the performance of a policy, such as

  • probability of never finding the source,

  • average time to find the source,

  • probability distribution of arrival times,

  • and much more.

It does so essentially by running thousands of episodes in parallel and averaging over those.

You can try with:

python3 evaluate.py

This will take some time (order of magnitude is 2 minutes on 8 cores). Logging information is displayed in the terminal while the episodes are running.

Windows users: if a NameError is raised, see known issues.

Once the script has completed, you can look at the results in the directory evaluate/outputs/YYmmdd-HHMMSS where ‘YYmmdd-HHMMSS’ is the time you started the script. Ymmdd-HHMMSS_figure_distributions.pdf is a figure summarizing the results. All output files are described here.

These results are for the “infotaxis” policy, which is the default policy. You can now try to compute the statistics of another policy on the same problem. For example, evaluate the “space-aware infotaxis” policy by running:

python3 evaluate.py --input myparam.py

where myparam.py is a file containing the line:

POLICY = 1

This file is already present in evaluate/parameters/ for this example.

The main policies are

  • POLICY = 0 for infotaxis (default)

  • POLICY = 1 for space-aware infotaxis, a recently proposed heuristic that beats infotaxis in most cases

  • POLICY = -1 for a reinforcement learning policy: for that we need to learn first!

All policies are described here.

Learning a policy

The learn.py script learns a policy using deep reinforcement learning. It actually trains a neural network model of the optimal value function. The (approximately) optimal policy is then derived from this function.

To train a model, go to the learn directory and use:

python3 learn.py

Now is the perfect time for a coffee since it will take quite a while. Logging information is displayed in the terminal while the script runs (if the script seems to have frozen, see known issues).

When you come back, you can look at the contents of the learn/outputs/YYmmdd-HHMMSS directory. There should be a figure called YYmmdd-HHMMSS_figure_learning_progress.png (if not you need a larger coffee).

This figure shows the progress of the learning agent and is periodically updated as the training progresses. In particular, it shows the evolution of ‘p_not_found’, the probability that the source is never found, and of ‘mean’, the mean time to find the source provided it is ever found (if p_not_found is large, the mean is meaningless).

Other outputs are described here.

Completing the training may take up to roughly 5000-10000 iterations (several hours on an average laptop), but progress should be clearly visible from 500-1000 iterations. For reference, the optimal policy yields p_not_found < 1e-6 and mean ~ 7.15.

Training will continue until 10000 iterations, but can be stopped at any time.

Models are saved in the learn/models/YYmmdd-HHMMSS directory:

  • YYmmdd-HHMMSS_model is the most recent model,

  • YYmmdd-HHMMSS_model_bkp_i, where i is an integer, are the models saved at evaluation points (the models which performance is shown in YYmmdd-HHMMSS_figure_learning_progress.png).

Note: training can restart from a previously saved model.

Visualizing and evaluating a learned policy

Once a neural network model is trained, the corresponding policy can be evaluated or visualized by running the main scripts with a parameter file (using --input) containing:

POLICY = -1
MODEL_PATH = "../learn/models/YYmmdd-HHMMSS/YYmmdd-HHMMSS_model_bkp_i"

where MODEL_PATH is the path to the neural network model.

Important: parameters should be consistent. For example, if you set N_DIMS = 2 for learning then you must also set N_DIMS = 2 for evaluation and visualization.

Trained neural networks

A collection of trained neural networks is provided in the zoo directory accessible from the root of the package. They are saved in the models directory and corresponding parameter files are in the parameters directory. They are named zoo_model_i_j_k where i, j, k are integers associated to N_DIMS, LAMBDA_OVER_DX, R_DT. The list of all trained neural networks is available here.

To visualize the policy associated to the neural network model zoo_model_1_2_2, use:

python3 visualize.py --input zoo_model_1_2_2

Similarly you can evaluate this neural network policy with:

python3 evaluate.py --input zoo_model_1_2_2

Custom policies

You want to try your own policy? Policies are implemented in classes/heuristicpolicies. You can define your own in the function _custom_policy.

To use it in the main scripts, set POLICY = 2 in your parameter file.

To facilitate the evaluation of new policies compared to existing baselines, the performances of several policies (infotaxis, space-aware infotaxis and near-optimal) are reported in a dataset.

Cleaning up

The directories can be restored to their original state by running the cleanall.sh bash script located at the root of the package.

Warning: all user-generated outputs and models will be deleted!