Modules

sourcetracking

Provides the SourceTracking class, to simulate the source-tracking POMDP.

class otto.classes.sourcetracking.SourceTracking(Ndim, lambda_over_dx, R_dt, norm_Poisson='Euclidean', Ngrid=None, Nhits=None, draw_source=False, initial_hit=None, dummy=False)

Environment used to simulate the source-tracking POMDP.

Parameters

Ndim (int) – number of space dimensions (1D, 2D…)
lambda_over_dx (float) – dimensionless problem size (odor dispersion lengthscale divided by agent step size)
R_dt (float) – dimensionless source intensity (source rate of emission multiplied by the agent time step)
norm_Poisson ('Euclidean', 'Manhattan' or 'Chebyshev', optional) – norm used for hit detections (default=’Euclidean’)
Ngrid (int or None, optional) – linear size of the domain, set automatically if None (default=None)
Nhits (int or None, optional) – number of possible hit values, set automatically if None (default=None)
draw_source (bool, optional) – whether to actually draw the source location (otherwise uses Bayesian framework) (default=False)
initial_hit (int or None, optional) – value of the initial hit, if None drawn randomly according to relevant probability distribution (default=None)
dummy (bool, optional) – set automatic parameters (e.g., Ngrid) but does not initialize the POMDP (default=False)

Ndim

number of dimension of space (1D, 2D…)

Type: int

lambda_over_dx

dimensionless problem size (odor dispersion lengthscale divided by agent step size)

Type: float

R_dt

dimensionless source intensity (source rate of emission multiplied by the agent time step)

Type: float

norm_Poisson

norm used for hit detections: ‘Euclidean’, ‘Manhattan’ or ‘Chebyshev’

Type: str

Ngrid

linear size of the domain

Type: int

Nhits

number of possible hit values

Type: int

draw_source

whether a source location is actually drawn (otherwise uses Bayesian framework)

Type: bool

initial_hit

value of the initial hit

Type: int

Nactions

number of possible actions

Type: int

NN_input_shape

shape of the input array for neural network models

Type: tuple(int)

mu0_Poisson

mean number of hits at a distance lambda_over_dx from the source

Type: float

agent

current agent location

Type: list(int)

p_source

current probability distribution of the source location)

Type: ndarray

obs

current observation (“hit” and “done”)

Type: dict

hit_map

number of hits received for each location (-1 for cells not visited ye

Type: ndarray

cumulative_hits

cumulated sum of hits received (ignoring initial hit)

Type: int

agent_near_boundaries

whether the agent is currently near a boundary

Type: bool

agent_stuck

whether the agent is currently stuck in an “infinite” loop

Type: bool

restart(initial_hit=None)

Restart the search.

Parameters: initial_hit (int or None) – initial hit, if None then random

step(action, hit=None, quiet=False)

Make a step in the source-tracking environment:

The agent moves to its new position according to action,

The agent receives a hit or the source is found,

The belief (self.p_source) and the hit map (self.hit_map) are updated.

Parameters

action (int) – action of the agent
hit (int, optional) – prescribed number of hits, if None (default) the number of hits is chosen randomly according to its probability
quiet (bool, optional) – whether to print when agent is attempting a forbidden move (default=False)

Returns

hit (int) – number of hits received
p_end (float) – probability of having found the source (relevant only if not draw_source)
done (bool) – whether the source has been found (relevant only if draw_source)

policy

Generic policy class and functions

class otto.classes.policy.Policy(policy)

A generic policy template.

Parameters

policy (int) –

-1: neural network
0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)
1: space-aware infotaxis
2: custom policy (to be implemented by the user)
5: random walk
6: greedy policy
7: mean distance policy
8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)

policy_index

policy index

Type: int

policy_name

name of the policy

Type: str

choose_action()

Choose an action based on the current belief (env.p_source, env.agent), according to the policy.

Returns: action_chosen (int) – chosen action

otto.classes.policy.policy_name(policy_index): Returns the name of the policy associated to policy_index

heuristicpolicy

Definition of heuristic policies such as infotaxis.

class otto.classes.heuristicpolicy.HeuristicPolicy(env, policy, steps_ahead=1, discount=None)

Bases: Policy

A heuristic policy.

Parameters

env (SourceTracking) – an instance of the source-tracking POMDP
policy (int) –
- 0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)
- 1: space-aware infotaxis
- 2: custom policy (to be implemented by the user)
- 5: random walk
- 6: greedy policy
- 7: mean distance policy
- 8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
- 9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
steps_ahead (int, optional) – number of anticipated future moves (default=1), > 1 only for infotaxis
discount (float or None, optional) – discount factor to use when steps_ahead> 1, automatically set if None (default=None)

env

source-tracking POMDP

Type: SourceTracking

policy_index

policy index

Type: int

policy_name

name of the policy

Type: str

steps_ahead

number of anticipated future moves, for infotaxis only

Type: int

discount

discount factor used for steps_ahead > 1, or None if steps_ahead = 1

Type: float or None

rlpolicy

Definition of the RL policy (policy based on a neural network value model).

class otto.classes.rlpolicy.RLPolicy(env, model, sym_avg=True)

Bases: Policy

An RL policy, that is, a policy based on a value model.

Parameters

env (SourceTracking) – an instance of the source-tracking POMDP
model (ValueModel) – an instance of the neural network model
sym_avg (bool, optional) – whether to average the value over symmetric duplicates

env

source-tracking POMDP

Type: SourceTracking

model

neural network model

Type: ValueModel

policy_index

policy index, set to -1

Type: int

policy_name

name of the policy

Type: str

training

Functions and classes required to train a value model on the source-tracking POMDP.

class otto.classes.training.State(p_source, agent, prob=1.0)

Bases: object

Defines a (belief) state. This is the class used to store transitions (s, s’).

p_source

probability distribution of the source

Type: ndarray

agent

location of the agent

Type: list(int)

prob

if current state s, then prob=1.0;
if next state s’, then probability to transit from s to this state

Type: float

class otto.classes.training.TrainingEnv(*args, **kwargs)

Bases: SourceTracking

Add functions useful for training to the SourceTracking class

apply_sym_transformation(sym)

Apply a symmetry transformation (rotation, mirror, flip, etc.) to p_source, agent, hit_map, …

Parameters: sym (int) – which symmetry transformation to apply (0 for none)

choose_action_from_statep(model, statep=None, sym_avg=False)

Choose an action according to the current value model and successor states statep. Has the same effect as choose_action from the RLPolicy class, but is faster if statep is provided.

Parameters

model (ValueModel) – value model to be used
statep (ndarray or None, optional) – array of all possible next states reachable from current state, as computed using transitions(); if None then will be computed within this function (default=None)
sym_avg (bool, optional) – whether to average the value over symmetric-equivalent states (default=False)

Returns

action_chosen (int) – action chosen according to the policy

get_state_value(model, state)

Returns the value of a current state according to the model

Parameters

model (ValueModel) – model to be used
state (State or ndarray) – single State object or numpy array of State objects, with shape (batch_size,)

Returns

value (ndarray) – numpy array of values, with shape (batch_size, 1)

get_target(modelvalue, modelaction, statep)

Compute the target value (for training) of a state s.

Parameters

modelvalue (ValueModel) – model used to compute values
modelaction (ValueModel or None) – model used to choose action, if None then same as modelvalue
statep (ndarray) – array of States objects with shape (Nactions, Nhits) or (batch_size, Nactions, Nhits), containing all states s’ reached from state s for all possible actions and hit values

Returns

target (ndarray) – array of target values with shape (batch_size, 1)

states2inputs(states, dims)

Convert states into inputs required to compute their value with the model.

Parameters

states (State object or ndarray of State objects) –
- if current states s: single State with no shape or array with shape (batch_size, )
- if next states s’: array with shape (Nactions, Nhits) or shape (batch_size, Nactions, Nhits)
dims (0 or 2) – flag used to differentiate between current states and next states - if current states s: 0 - if next states s’: 2 (this is because one current state yields (Nactions, Nhits) next states)

Returns

inputs (ndarray) – array of inputs for the model, with: - if current states s: shape (batch_size, input_shape) - if next states s’: shape (batch_size, Nactions, Nhits, input_shape)
prob (ndarray) –
- if current states s: array of 1.0, with shape (batch_size, )
- if next states s’: array of transition probabilities, with shape (batch_size, Nactions, Nhits)

transitions()

Compute all possible successors s’ that can be reached from state s

Returns

state (State) – current state
statep (ndarray of State) – array of all states reached for all possible actions and hit values

valuemodel

Provides the ValueModel class, for defining a neural network model of the value function.

class otto.classes.valuemodel.ValueModel(*args, **kwargs)

Bases: Model

Neural network model used to predict the value of the belief state (i.e. the expected remaining time to find the source).

Parameters

Ndim (int) – number of space dimensions (1D, 2D, …) for the search problem
FC_layers (int) – number of hidden layers
FC_units (int or tuple(int)) – units per layer
regularization_factor (float, optional) – factor for regularization losses (default=0.0)
loss_function (str, optional) – either ‘mean_absolute_error’, ‘mean_absolute_percentage_error’ or ‘mean_squared_error’ (default)

config

saves the args in a dictionary, can be used to recreate the model

Type: dict

build_graph(input_shape_nobatch)

Builds the model. Use this function instead of model.build() so that a call to model.summary() gives shape information.

Parameters: input_shape_nobatch (tuple(int)) – shape of the neural network input given by otto.classes.sourcetracking.SourceTracking.NN_input_shape

call(x, training=False, sym_avg=False)

Call the value model

Parameters

x (ndarray or tf.tensor with shape (batch_size, input_shape)) – array containing a batch of inputs
training (bool, optional) – whether this call is done during training (as opposed to evaluation) (default=False)
sym_avg (bool, optional) – whether to take the average value of symmetric duplicates (default=False)

Returns

x (tf.tensor with shape (batch_size, 1)) – array containing a batch of values

save_model(model_dir): Save the model to model_dir.

test_step(x, y)

A test step.

Parameters

x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs
y (tf.tensor with shape=(batch_size, 1)) – batch of target values

Returns

loss (tf.tensor with shape=()) – total loss

train_step(x, y, augment=False)

A training step.

Parameters

x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs
y (tf.tensor with shape=(batch_size, 1)) – batch of target values

Returns

loss (tf.tensor with shape=()) – total loss

otto.classes.valuemodel.reload_model(model_dir, inputshape)

Load a model.

Parameters

model_dir (str) – path to the model
inputshape (ndarray) – shape of the neural network input given by otto.classes.sourcetracking.SourceTracking.NN_input_shape

visualization

Provides the Visualization class, for rendering episodes.

class otto.classes.visualization.Visualization(env, live=False, filename='test', log_prob=False, marginal_prob_3d=False)

A class for visualizing the search in 1D, 2D or 3D

Parameters

env (SourceTracking) – an instance of the SourceTracking class
live (bool, optional) – whether to show live preview (faster if False) (default=False)
filename (str, optional) – file name for the video (default=’test’)
log_prob (bool, optional) – whether to show log(prob) instead of prob (default=False)
marginal_prob_3d (bool, optional) – in 3D, whether to show marginal pdfs on each plane, instead of the pdf in the planes that the agent crosses (default=False)

make_video(frame_rate=5, keep_frames=False)

Make a video from recorded frames and clean up frames.

Parameters

frame_rate (int) – number of frames per second (default=5)
keep_frames (bool) – whether to keep the frames as images (default=False)

Returns

exit_code (int) – nonzero if something went wrong while making the video, in that case frames will be saved even if keep_frames = False

record_snapshot(num, toptext='')

Create a frame from current state of the search, and save it.

Parameters

num (int) – frame number (used to create filename)
toptext (str) – text that will appear in the top part of the frame (like a title)

gymwrapper

Provides an OpenAI Gym interface for the source-tracking POMDP.

class otto.classes.gymwrapper.GymWrapper(sim, stop_t=100000)

Bases: Env

OpenAI Gym interface for the source-tracking POMDP.

Parameters

sim (SourceTracking) – an instance of the SourceTracking class with draw_source = True
stop_t (int, optional) – maximum number of timesteps, it should be large enough to (almost) never be reached

action_space: spaces.Space[ActType]

observation_space: spaces.Space[ObsType]

reset()

Reset the search.

Returns: observation (ndarray) – p_source centered on the agent

step(action)

Make a step in the environment.

Parameters

action (int) – the action to execute

Returns

observation (ndarray) – p_source centered on the agent
reward (float) – always a -1 time penalty
done (bool) – whether the search is over
info (dict) – contains the hits received, whether the maximum number of steps is reached, and whether the source was found