Modules

sourcetracking

Provides the SourceTracking class, to simulate the source-tracking POMDP.

class otto.classes.sourcetracking.SourceTracking(Ndim, lambda_over_dx, R_dt, norm_Poisson='Euclidean', Ngrid=None, Nhits=None, draw_source=False, initial_hit=None, dummy=False)

Environment used to simulate the source-tracking POMDP.

Parameters
  • Ndim (int) – number of space dimensions (1D, 2D…)

  • lambda_over_dx (float) – dimensionless problem size (odor dispersion lengthscale divided by agent step size)

  • R_dt (float) – dimensionless source intensity (source rate of emission multiplied by the agent time step)

  • norm_Poisson ('Euclidean', 'Manhattan' or 'Chebyshev', optional) – norm used for hit detections (default=’Euclidean’)

  • Ngrid (int or None, optional) – linear size of the domain, set automatically if None (default=None)

  • Nhits (int or None, optional) – number of possible hit values, set automatically if None (default=None)

  • draw_source (bool, optional) – whether to actually draw the source location (otherwise uses Bayesian framework) (default=False)

  • initial_hit (int or None, optional) – value of the initial hit, if None drawn randomly according to relevant probability distribution (default=None)

  • dummy (bool, optional) – set automatic parameters (e.g., Ngrid) but does not initialize the POMDP (default=False)

Ndim

number of dimension of space (1D, 2D…)

Type

int

lambda_over_dx

dimensionless problem size (odor dispersion lengthscale divided by agent step size)

Type

float

R_dt

dimensionless source intensity (source rate of emission multiplied by the agent time step)

Type

float

norm_Poisson

norm used for hit detections: ‘Euclidean’, ‘Manhattan’ or ‘Chebyshev’

Type

str

Ngrid

linear size of the domain

Type

int

Nhits

number of possible hit values

Type

int

draw_source

whether a source location is actually drawn (otherwise uses Bayesian framework)

Type

bool

initial_hit

value of the initial hit

Type

int

Nactions

number of possible actions

Type

int

NN_input_shape

shape of the input array for neural network models

Type

tuple(int)

mu0_Poisson

mean number of hits at a distance lambda_over_dx from the source

Type

float

agent

current agent location

Type

list(int)

p_source

current probability distribution of the source location)

Type

ndarray

obs

current observation (“hit” and “done”)

Type

dict

hit_map

number of hits received for each location (-1 for cells not visited ye

Type

ndarray

cumulative_hits

cumulated sum of hits received (ignoring initial hit)

Type

int

agent_near_boundaries

whether the agent is currently near a boundary

Type

bool

agent_stuck

whether the agent is currently stuck in an “infinite” loop

Type

bool

restart(initial_hit=None)

Restart the search.

Parameters

initial_hit (int or None) – initial hit, if None then random

step(action, hit=None, quiet=False)

Make a step in the source-tracking environment:

  1. The agent moves to its new position according to action,

  2. The agent receives a hit or the source is found,

  3. The belief (self.p_source) and the hit map (self.hit_map) are updated.

Parameters
  • action (int) – action of the agent

  • hit (int, optional) – prescribed number of hits, if None (default) the number of hits is chosen randomly according to its probability

  • quiet (bool, optional) – whether to print when agent is attempting a forbidden move (default=False)

Returns
  • hit (int) – number of hits received

  • p_end (float) – probability of having found the source (relevant only if not draw_source)

  • done (bool) – whether the source has been found (relevant only if draw_source)

policy

Generic policy class and functions

class otto.classes.policy.Policy(policy)

A generic policy template.

Parameters

policy (int) –

  • -1: neural network

  • 0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)

  • 1: space-aware infotaxis

  • 2: custom policy (to be implemented by the user)

  • 5: random walk

  • 6: greedy policy

  • 7: mean distance policy

  • 8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)

  • 9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)

policy_index

policy index

Type

int

policy_name

name of the policy

Type

str

choose_action()

Choose an action based on the current belief (env.p_source, env.agent), according to the policy.

Returns

action_chosen (int) – chosen action

otto.classes.policy.policy_name(policy_index)

Returns the name of the policy associated to policy_index

heuristicpolicy

Definition of heuristic policies such as infotaxis.

class otto.classes.heuristicpolicy.HeuristicPolicy(env, policy, steps_ahead=1, discount=None)

Bases: Policy

A heuristic policy.

Parameters
  • env (SourceTracking) – an instance of the source-tracking POMDP

  • policy (int) –

    • 0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)

    • 1: space-aware infotaxis

    • 2: custom policy (to be implemented by the user)

    • 5: random walk

    • 6: greedy policy

    • 7: mean distance policy

    • 8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)

    • 9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)

  • steps_ahead (int, optional) – number of anticipated future moves (default=1), > 1 only for infotaxis

  • discount (float or None, optional) – discount factor to use when steps_ahead> 1, automatically set if None (default=None)

env

source-tracking POMDP

Type

SourceTracking

policy_index

policy index

Type

int

policy_name

name of the policy

Type

str

steps_ahead

number of anticipated future moves, for infotaxis only

Type

int

discount

discount factor used for steps_ahead > 1, or None if steps_ahead = 1

Type

float or None

rlpolicy

Definition of the RL policy (policy based on a neural network value model).

class otto.classes.rlpolicy.RLPolicy(env, model, sym_avg=True)

Bases: Policy

An RL policy, that is, a policy based on a value model.

Parameters
  • env (SourceTracking) – an instance of the source-tracking POMDP

  • model (ValueModel) – an instance of the neural network model

  • sym_avg (bool, optional) – whether to average the value over symmetric duplicates

env

source-tracking POMDP

Type

SourceTracking

model

neural network model

Type

ValueModel

policy_index

policy index, set to -1

Type

int

policy_name

name of the policy

Type

str

training

Functions and classes required to train a value model on the source-tracking POMDP.

class otto.classes.training.State(p_source, agent, prob=1.0)

Bases: object

Defines a (belief) state. This is the class used to store transitions (s, s’).

p_source

probability distribution of the source

Type

ndarray

agent

location of the agent

Type

list(int)

prob
  • if current state s, then prob=1.0;

  • if next state s’, then probability to transit from s to this state

Type

float

class otto.classes.training.TrainingEnv(*args, **kwargs)

Bases: SourceTracking

Add functions useful for training to the SourceTracking class

apply_sym_transformation(sym)

Apply a symmetry transformation (rotation, mirror, flip, etc.) to p_source, agent, hit_map, …

Parameters

sym (int) – which symmetry transformation to apply (0 for none)

choose_action_from_statep(model, statep=None, sym_avg=False)

Choose an action according to the current value model and successor states statep. Has the same effect as choose_action from the RLPolicy class, but is faster if statep is provided.

Parameters
  • model (ValueModel) – value model to be used

  • statep (ndarray or None, optional) – array of all possible next states reachable from current state, as computed using transitions(); if None then will be computed within this function (default=None)

  • sym_avg (bool, optional) – whether to average the value over symmetric-equivalent states (default=False)

Returns

action_chosen (int) – action chosen according to the policy

get_state_value(model, state)

Returns the value of a current state according to the model

Parameters
  • model (ValueModel) – model to be used

  • state (State or ndarray) – single State object or numpy array of State objects, with shape (batch_size,)

Returns

value (ndarray) – numpy array of values, with shape (batch_size, 1)

get_target(modelvalue, modelaction, statep)

Compute the target value (for training) of a state s.

Parameters
  • modelvalue (ValueModel) – model used to compute values

  • modelaction (ValueModel or None) – model used to choose action, if None then same as modelvalue

  • statep (ndarray) – array of States objects with shape (Nactions, Nhits) or (batch_size, Nactions, Nhits), containing all states s’ reached from state s for all possible actions and hit values

Returns

target (ndarray) – array of target values with shape (batch_size, 1)

states2inputs(states, dims)

Convert states into inputs required to compute their value with the model.

Parameters
  • states (State object or ndarray of State objects) –

    • if current states s: single State with no shape or array with shape (batch_size, )

    • if next states s’: array with shape (Nactions, Nhits) or shape (batch_size, Nactions, Nhits)

  • dims (0 or 2) – flag used to differentiate between current states and next states - if current states s: 0 - if next states s’: 2 (this is because one current state yields (Nactions, Nhits) next states)

Returns
  • inputs (ndarray) – array of inputs for the model, with: - if current states s: shape (batch_size, input_shape) - if next states s’: shape (batch_size, Nactions, Nhits, input_shape)

  • prob (ndarray) –

    • if current states s: array of 1.0, with shape (batch_size, )

    • if next states s’: array of transition probabilities, with shape (batch_size, Nactions, Nhits)

transitions()

Compute all possible successors s’ that can be reached from state s

Returns
  • state (State) – current state

  • statep (ndarray of State) – array of all states reached for all possible actions and hit values

valuemodel

Provides the ValueModel class, for defining a neural network model of the value function.

class otto.classes.valuemodel.ValueModel(*args, **kwargs)

Bases: Model

Neural network model used to predict the value of the belief state (i.e. the expected remaining time to find the source).

Parameters
  • Ndim (int) – number of space dimensions (1D, 2D, …) for the search problem

  • FC_layers (int) – number of hidden layers

  • FC_units (int or tuple(int)) – units per layer

  • regularization_factor (float, optional) – factor for regularization losses (default=0.0)

  • loss_function (str, optional) – either ‘mean_absolute_error’, ‘mean_absolute_percentage_error’ or ‘mean_squared_error’ (default)

config

saves the args in a dictionary, can be used to recreate the model

Type

dict

build_graph(input_shape_nobatch)

Builds the model. Use this function instead of model.build() so that a call to model.summary() gives shape information.

Parameters

input_shape_nobatch (tuple(int)) – shape of the neural network input given by otto.classes.sourcetracking.SourceTracking.NN_input_shape

call(x, training=False, sym_avg=False)

Call the value model

Parameters
  • x (ndarray or tf.tensor with shape (batch_size, input_shape)) – array containing a batch of inputs

  • training (bool, optional) – whether this call is done during training (as opposed to evaluation) (default=False)

  • sym_avg (bool, optional) – whether to take the average value of symmetric duplicates (default=False)

Returns

x (tf.tensor with shape (batch_size, 1)) – array containing a batch of values

save_model(model_dir)

Save the model to model_dir.

test_step(x, y)

A test step.

Parameters
  • x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs

  • y (tf.tensor with shape=(batch_size, 1)) – batch of target values

Returns

loss (tf.tensor with shape=()) – total loss

train_step(x, y, augment=False)

A training step.

Parameters
  • x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs

  • y (tf.tensor with shape=(batch_size, 1)) – batch of target values

Returns

loss (tf.tensor with shape=()) – total loss

otto.classes.valuemodel.reload_model(model_dir, inputshape)

Load a model.

Parameters

visualization

Provides the Visualization class, for rendering episodes.

class otto.classes.visualization.Visualization(env, live=False, filename='test', log_prob=False, marginal_prob_3d=False)

A class for visualizing the search in 1D, 2D or 3D

Parameters
  • env (SourceTracking) – an instance of the SourceTracking class

  • live (bool, optional) – whether to show live preview (faster if False) (default=False)

  • filename (str, optional) – file name for the video (default=’test’)

  • log_prob (bool, optional) – whether to show log(prob) instead of prob (default=False)

  • marginal_prob_3d (bool, optional) – in 3D, whether to show marginal pdfs on each plane, instead of the pdf in the planes that the agent crosses (default=False)

make_video(frame_rate=5, keep_frames=False)

Make a video from recorded frames and clean up frames.

Parameters
  • frame_rate (int) – number of frames per second (default=5)

  • keep_frames (bool) – whether to keep the frames as images (default=False)

Returns

exit_code (int) – nonzero if something went wrong while making the video, in that case frames will be saved even if keep_frames = False

record_snapshot(num, toptext='')

Create a frame from current state of the search, and save it.

Parameters
  • num (int) – frame number (used to create filename)

  • toptext (str) – text that will appear in the top part of the frame (like a title)

gymwrapper

Provides an OpenAI Gym interface for the source-tracking POMDP.

class otto.classes.gymwrapper.GymWrapper(sim, stop_t=100000)

Bases: Env

OpenAI Gym interface for the source-tracking POMDP.

Parameters
  • sim (SourceTracking) – an instance of the SourceTracking class with draw_source = True

  • stop_t (int, optional) – maximum number of timesteps, it should be large enough to (almost) never be reached

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
reset()

Reset the search.

Returns

observation (ndarray) – p_source centered on the agent

step(action)

Make a step in the environment.

Parameters

action (int) – the action to execute

Returns
  • observation (ndarray) – p_source centered on the agent

  • reward (float) – always a -1 time penalty

  • done (bool) – whether the search is over

  • info (dict) – contains the hits received, whether the maximum number of steps is reached, and whether the source was found