Modules
sourcetracking
Provides the SourceTracking class, to simulate the source-tracking POMDP.
- class otto.classes.sourcetracking.SourceTracking(Ndim, lambda_over_dx, R_dt, norm_Poisson='Euclidean', Ngrid=None, Nhits=None, draw_source=False, initial_hit=None, dummy=False)
Environment used to simulate the source-tracking POMDP.
- Parameters
Ndim (int) – number of space dimensions (1D, 2D…)
lambda_over_dx (float) – dimensionless problem size (odor dispersion lengthscale divided by agent step size)
R_dt (float) – dimensionless source intensity (source rate of emission multiplied by the agent time step)
norm_Poisson ('Euclidean', 'Manhattan' or 'Chebyshev', optional) – norm used for hit detections (default=’Euclidean’)
Ngrid (int or None, optional) – linear size of the domain, set automatically if None (default=None)
Nhits (int or None, optional) – number of possible hit values, set automatically if None (default=None)
draw_source (bool, optional) – whether to actually draw the source location (otherwise uses Bayesian framework) (default=False)
initial_hit (int or None, optional) – value of the initial hit, if None drawn randomly according to relevant probability distribution (default=None)
dummy (bool, optional) – set automatic parameters (e.g., Ngrid) but does not initialize the POMDP (default=False)
- Ndim
number of dimension of space (1D, 2D…)
- Type
int
- lambda_over_dx
dimensionless problem size (odor dispersion lengthscale divided by agent step size)
- Type
float
- R_dt
dimensionless source intensity (source rate of emission multiplied by the agent time step)
- Type
float
- norm_Poisson
norm used for hit detections: ‘Euclidean’, ‘Manhattan’ or ‘Chebyshev’
- Type
str
- Ngrid
linear size of the domain
- Type
int
- Nhits
number of possible hit values
- Type
int
- draw_source
whether a source location is actually drawn (otherwise uses Bayesian framework)
- Type
bool
- initial_hit
value of the initial hit
- Type
int
- Nactions
number of possible actions
- Type
int
- NN_input_shape
shape of the input array for neural network models
- Type
tuple(int)
- mu0_Poisson
mean number of hits at a distance lambda_over_dx from the source
- Type
float
- agent
current agent location
- Type
list(int)
- p_source
current probability distribution of the source location)
- Type
ndarray
- obs
current observation (“hit” and “done”)
- Type
dict
- hit_map
number of hits received for each location (-1 for cells not visited ye
- Type
ndarray
- cumulative_hits
cumulated sum of hits received (ignoring initial hit)
- Type
int
- agent_near_boundaries
whether the agent is currently near a boundary
- Type
bool
- agent_stuck
whether the agent is currently stuck in an “infinite” loop
- Type
bool
- restart(initial_hit=None)
Restart the search.
- Parameters
initial_hit (int or None) – initial hit, if None then random
- step(action, hit=None, quiet=False)
Make a step in the source-tracking environment:
The agent moves to its new position according to action,
The agent receives a hit or the source is found,
The belief (self.p_source) and the hit map (self.hit_map) are updated.
- Parameters
action (int) – action of the agent
hit (int, optional) – prescribed number of hits, if None (default) the number of hits is chosen randomly according to its probability
quiet (bool, optional) – whether to print when agent is attempting a forbidden move (default=False)
- Returns
hit (int) – number of hits received
p_end (float) – probability of having found the source (relevant only if not draw_source)
done (bool) – whether the source has been found (relevant only if draw_source)
policy
Generic policy class and functions
- class otto.classes.policy.Policy(policy)
A generic policy template.
- Parameters
policy (int) –
-1: neural network
0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)
1: space-aware infotaxis
2: custom policy (to be implemented by the user)
5: random walk
6: greedy policy
7: mean distance policy
8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
- policy_index
policy index
- Type
int
- policy_name
name of the policy
- Type
str
- choose_action()
Choose an action based on the current belief (env.p_source, env.agent), according to the policy.
- Returns
action_chosen (int) – chosen action
- otto.classes.policy.policy_name(policy_index)
Returns the name of the policy associated to policy_index
heuristicpolicy
Definition of heuristic policies such as infotaxis.
- class otto.classes.heuristicpolicy.HeuristicPolicy(env, policy, steps_ahead=1, discount=None)
Bases:
PolicyA heuristic policy.
- Parameters
env (SourceTracking) – an instance of the source-tracking POMDP
policy (int) –
0: infotaxis (Vergassola, Villermaux and Shraiman, Nature 2007)
1: space-aware infotaxis
2: custom policy (to be implemented by the user)
5: random walk
6: greedy policy
7: mean distance policy
8: voting policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
9: most likely state policy (Cassandra, Kaelbling & Kurien, IEEE 1996)
steps_ahead (int, optional) – number of anticipated future moves (default=1), > 1 only for infotaxis
discount (float or None, optional) – discount factor to use when steps_ahead> 1, automatically set if None (default=None)
- env
source-tracking POMDP
- Type
- policy_index
policy index
- Type
int
- policy_name
name of the policy
- Type
str
- steps_ahead
number of anticipated future moves, for infotaxis only
- Type
int
- discount
discount factor used for steps_ahead > 1, or None if steps_ahead = 1
- Type
float or None
rlpolicy
Definition of the RL policy (policy based on a neural network value model).
- class otto.classes.rlpolicy.RLPolicy(env, model, sym_avg=True)
Bases:
PolicyAn RL policy, that is, a policy based on a value model.
- Parameters
env (SourceTracking) – an instance of the source-tracking POMDP
model (ValueModel) – an instance of the neural network model
sym_avg (bool, optional) – whether to average the value over symmetric duplicates
- env
source-tracking POMDP
- Type
- model
neural network model
- Type
- policy_index
policy index, set to -1
- Type
int
- policy_name
name of the policy
- Type
str
training
Functions and classes required to train a value model on the source-tracking POMDP.
- class otto.classes.training.State(p_source, agent, prob=1.0)
Bases:
objectDefines a (belief) state. This is the class used to store transitions (s, s’).
- p_source
probability distribution of the source
- Type
ndarray
- agent
location of the agent
- Type
list(int)
- prob
if current state s, then prob=1.0;
if next state s’, then probability to transit from s to this state
- Type
float
- class otto.classes.training.TrainingEnv(*args, **kwargs)
Bases:
SourceTrackingAdd functions useful for training to the SourceTracking class
- apply_sym_transformation(sym)
Apply a symmetry transformation (rotation, mirror, flip, etc.) to p_source, agent, hit_map, …
- Parameters
sym (int) – which symmetry transformation to apply (0 for none)
- choose_action_from_statep(model, statep=None, sym_avg=False)
Choose an action according to the current value model and successor states statep. Has the same effect as choose_action from the RLPolicy class, but is faster if statep is provided.
- Parameters
model (ValueModel) – value model to be used
statep (ndarray or None, optional) – array of all possible next states reachable from current state, as computed using
transitions(); if None then will be computed within this function (default=None)sym_avg (bool, optional) – whether to average the value over symmetric-equivalent states (default=False)
- Returns
action_chosen (int) – action chosen according to the policy
- get_state_value(model, state)
Returns the value of a current state according to the model
- Parameters
model (ValueModel) – model to be used
state (State or ndarray) – single State object or numpy array of State objects, with shape (batch_size,)
- Returns
value (ndarray) – numpy array of values, with shape (batch_size, 1)
- get_target(modelvalue, modelaction, statep)
Compute the target value (for training) of a state s.
- Parameters
modelvalue (ValueModel) – model used to compute values
modelaction (ValueModel or None) – model used to choose action, if None then same as modelvalue
statep (ndarray) – array of States objects with shape (Nactions, Nhits) or (batch_size, Nactions, Nhits), containing all states s’ reached from state s for all possible actions and hit values
- Returns
target (ndarray) – array of target values with shape (batch_size, 1)
- states2inputs(states, dims)
Convert states into inputs required to compute their value with the model.
- Parameters
states (State object or ndarray of State objects) –
if current states s: single State with no shape or array with shape (batch_size, )
if next states s’: array with shape (Nactions, Nhits) or shape (batch_size, Nactions, Nhits)
dims (0 or 2) – flag used to differentiate between current states and next states - if current states s: 0 - if next states s’: 2 (this is because one current state yields (Nactions, Nhits) next states)
- Returns
inputs (ndarray) – array of inputs for the model, with: - if current states s: shape (batch_size, input_shape) - if next states s’: shape (batch_size, Nactions, Nhits, input_shape)
prob (ndarray) –
if current states s: array of 1.0, with shape (batch_size, )
if next states s’: array of transition probabilities, with shape (batch_size, Nactions, Nhits)
- transitions()
Compute all possible successors s’ that can be reached from state s
- Returns
state (State) – current state
statep (ndarray of State) – array of all states reached for all possible actions and hit values
valuemodel
Provides the ValueModel class, for defining a neural network model of the value function.
- class otto.classes.valuemodel.ValueModel(*args, **kwargs)
Bases:
ModelNeural network model used to predict the value of the belief state (i.e. the expected remaining time to find the source).
- Parameters
Ndim (int) – number of space dimensions (1D, 2D, …) for the search problem
FC_layers (int) – number of hidden layers
FC_units (int or tuple(int)) – units per layer
regularization_factor (float, optional) – factor for regularization losses (default=0.0)
loss_function (str, optional) – either ‘mean_absolute_error’, ‘mean_absolute_percentage_error’ or ‘mean_squared_error’ (default)
- config
saves the args in a dictionary, can be used to recreate the model
- Type
dict
- build_graph(input_shape_nobatch)
Builds the model. Use this function instead of model.build() so that a call to model.summary() gives shape information.
- Parameters
input_shape_nobatch (tuple(int)) – shape of the neural network input given by
otto.classes.sourcetracking.SourceTracking.NN_input_shape
- call(x, training=False, sym_avg=False)
Call the value model
- Parameters
x (ndarray or tf.tensor with shape (batch_size, input_shape)) – array containing a batch of inputs
training (bool, optional) – whether this call is done during training (as opposed to evaluation) (default=False)
sym_avg (bool, optional) – whether to take the average value of symmetric duplicates (default=False)
- Returns
x (tf.tensor with shape (batch_size, 1)) – array containing a batch of values
- save_model(model_dir)
Save the model to model_dir.
- test_step(x, y)
A test step.
- Parameters
x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs
y (tf.tensor with shape=(batch_size, 1)) – batch of target values
- Returns
loss (tf.tensor with shape=()) – total loss
- train_step(x, y, augment=False)
A training step.
- Parameters
x (tf.tensor with shape=(batch_size, input_shape)) – batch of inputs
y (tf.tensor with shape=(batch_size, 1)) – batch of target values
- Returns
loss (tf.tensor with shape=()) – total loss
- otto.classes.valuemodel.reload_model(model_dir, inputshape)
Load a model.
- Parameters
model_dir (str) – path to the model
inputshape (ndarray) – shape of the neural network input given by
otto.classes.sourcetracking.SourceTracking.NN_input_shape
visualization
Provides the Visualization class, for rendering episodes.
- class otto.classes.visualization.Visualization(env, live=False, filename='test', log_prob=False, marginal_prob_3d=False)
A class for visualizing the search in 1D, 2D or 3D
- Parameters
env (SourceTracking) – an instance of the SourceTracking class
live (bool, optional) – whether to show live preview (faster if False) (default=False)
filename (str, optional) – file name for the video (default=’test’)
log_prob (bool, optional) – whether to show log(prob) instead of prob (default=False)
marginal_prob_3d (bool, optional) – in 3D, whether to show marginal pdfs on each plane, instead of the pdf in the planes that the agent crosses (default=False)
- make_video(frame_rate=5, keep_frames=False)
Make a video from recorded frames and clean up frames.
- Parameters
frame_rate (int) – number of frames per second (default=5)
keep_frames (bool) – whether to keep the frames as images (default=False)
- Returns
exit_code (int) – nonzero if something went wrong while making the video, in that case frames will be saved even if keep_frames = False
- record_snapshot(num, toptext='')
Create a frame from current state of the search, and save it.
- Parameters
num (int) – frame number (used to create filename)
toptext (str) – text that will appear in the top part of the frame (like a title)
gymwrapper
Provides an OpenAI Gym interface for the source-tracking POMDP.
- class otto.classes.gymwrapper.GymWrapper(sim, stop_t=100000)
Bases:
EnvOpenAI Gym interface for the source-tracking POMDP.
- Parameters
sim (SourceTracking) – an instance of the SourceTracking class with draw_source = True
stop_t (int, optional) – maximum number of timesteps, it should be large enough to (almost) never be reached
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- reset()
Reset the search.
- Returns
observation (ndarray) – p_source centered on the agent
- step(action)
Make a step in the environment.
- Parameters
action (int) – the action to execute
- Returns
observation (ndarray) – p_source centered on the agent
reward (float) – always a -1 time penalty
done (bool) – whether the search is over
info (dict) – contains the hits received, whether the maximum number of steps is reached, and whether the source was found