Evaluate

The script evaluate.py is used to evaluate the performance of a given policy (such as intotaxis or an RL policy) on the source-tracking POMDP.

The script records many statistics and monitoring information, and plot results.

Computations are parallelized with multiprocessing (default) or with MPI (requires the mpi4py module).

To use MPI on n_cpus processors, run the following command line from a terminal:

mpiexec -n n_cpus python3 -m mpi4py evaluate.py -i custom_params.py

Outputs generated by the script are:

*figure_distributions.pdf
This figure summarizes the results. It shows the distributions (pdf and cdf) of the number of steps and hits to find the source, as well as statistics (p_not_found, mean, standard deviation, etc.).
*figure_convergence.pdf
This figure shows the evolution of the mean number of steps to find the source as a function of the number of episodes simulated. It useful to assess convergence of the statistics.
*monitoring_summary.txt
Text file summarizing various information, such as the distribution of initial hits, the number of failed episodes, and the computational cost.
*monitoring_episodes.txt
Text file containing detailed monitoring information for each episode (p_not_found, mean number of steps to find the source, reason for terminating the episode, whether the agent touched a boundary, value of the initial hit, …)
*statistics.txt
Text file with the statistics (mean, standard deviation, median, etc.) of the distributions of the number of steps and hits to find the source
*parameters.txt
Text file summarizing the parameters used.
*table_CFD_nsteps.npy, *table_CFD_nhits.npy
Numpy array containing the cdf (cumulative distribution function) of the number of steps or hits to find the source (first row is x, second row is cdf(x)).

Parameters of the script are:

Source-tracking POMDP
- N_DIMS (int > 0)
  number of space dimensions (1D, 2D, …)
- LAMBDA_OVER_DX (float >= 1.0)
  dimensionless problem size
- R_DT (float > 0.0)
  dimensionless source intensity
- NORM_POISSON (‘Euclidean’, ‘Manhattan’ or ‘Chebyshev’)
  norm used for hit detections, usually ‘Euclidean’
- N_HITS (int >= 2 or None)
  number of possible hit values, set automatically if None
- N_GRID (odd int >= 3 or None)
  linear size of the domain, set automatically if None
Policy
- POLICY (int)
  
  -1: neural network
  
  0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)
  
  1: space-aware infotaxis
  
  2: custom policy (to be implemented by the user)
  
  5: random walk
  
  6: greedy policy
  
  7: mean distance policy
  
  8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
  
  9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
- STEPS_AHEAD (int >= 1)
  number of anticipated moves, can be > 1 only for POLICY=0
- MODEL_PATH (str or None)
  path of the model (neural network) for POLICY=-1, None otherwise
Criteria for episode termination
- STOP_t (int > 0 or None)
  maximum number of steps per episode, set automatically if None
- STOP_p (float ~ 0.0)
  episode stops when the probability that the source has been found is greater than 1 - STOP_p
Statistics computation
- ADAPTIVE_N_RUNS (bool)
  if True, more episodes will be simulated until the estimated error is less than REL_TOL
- REL_TOL (0.0 < float < 1.0)
  if ADAPTIVE_N_RUNS: tolerance on the relative error on the mean number of steps to find the source
- MAX_N_RUNS (int > 0 or None)
  if ADAPTIVE_N_RUNS: maximum number of runs, set automatically if None
- N_RUNS (int > 0 or None)
  if not ADAPTIVE_N_RUNS: number of episodes to simulate, set automatically if None
Saving
- RUN_NAME (str or None)
  prefix used for all output files, if None will use a timestamp
Parallelization
- N_PARALLEL (int)
  number of episodes computed in parallel (if <= 0, will use all available cpus)
  
  This is only when using multiprocessing for parallelization (it has no effect with MPI).
  
  Known bug: for large neural networks, the code may hang if N_PARALLEL > 1, so use N_PARALLEL = 1 instead.