pyrltr.agents package¶

pyrltr.agents.Agent module¶

class pyrltr.agents.Agent.Agent(world, learner, teacher, askLikelihood, index=0)[source]

Agent is the base class for all learning agents.

Attributes:

world – the world in which this agent is supposed to act learner – the learning algorithm teacher – the teacher askLikelihood – the likelihood of asking the teacher to evaluate the

last action

randomState – the RandomState for the stochastic influence

doFinalTestingEpoch()[source]

Does one testing epoch.

doTestingEpisode(dataContainer)[source]

Runs one testing episode. The learner does the action selection without exploration and no updates are performed.

doTestingEpoch()[source]

Does one testing epoch.

doTrainingEpisode()[source]

Runs one training episode. The learner does the action selection and update calculation.

doTrainingEpoch()[source]

Does one training epoch.

finalize()[source]

Finalize complete run of this actor.

reset()[source]

Resets the world after an episode and adds the latest result to the collection.

pyrltr.agents.Cacla module¶

Implementation of the Continuous actor critic automaton. Uses neural networks as actor and critic.

class pyrltr.agents.Cacla.ADCacla(actorLayout, transferFunctions, replicationNumber, maxReward=1, alpha=0.2, beta=0.2, gamma=0.1, sigma=0.05, folder='Cacla')[source]

An Action Dependent implementation of Cacla. The main difference is that the critic does not learn the value function but the Q-Function.

initController()[source]

Initializes the controllers for the critic and actor.

updateReward(state, reward, nextState, action, episodeOver)[source]

Updated the reward according to the algorithms definition.

Parameters:

state – the current state of the environment reward – the just received reward nextState – the state of the environment after performing the

action

action – the just performed action episodeOver – True if this was the last step in this episode

class pyrltr.agents.Cacla.Cacla(actorLayout, transferFunctions, replicationNumber, maxReward=1, alpha=0.2, beta=0.2, gamma=0.1, sigma=0.05, folder='Cacla')[source]

Implementation of Continuous Actor-Critic Learning Automata (CACLA).

finalize(folder, index)[source]

Finalizes this learner, e.g. saves the controllers after training.

getAction(state)[source]

Selects an action without exploration.

Parameters:
state – the state for which to select the action returns – the result from the actor’s controller
getCriticOpinion(state)[source]

Returns the critic’s current estimate of the value function for the state.

Parameters:
state – the state the critic shall evaluate returns – the current estimate of the value function for the state
getDataFolderName()[source]

Returns the name of the folder containing the stored data.

initController()[source]

Initializes the controllers for the critic and actor.

reset()[source]

Resets the learner.

scaleReward(reward)[source]

Scales the reward to be within [-1, 1].

selectAction(state)[source]

Selects an action with exploration.

Parameters:

state – the state for which to select the action returns – the result from the actor’s controller plus the

exploration
updateReward(state, reward, nextState, action, episodeOver)[source]

Updated the reward according to the algorithms definition.

Parameters:

state – the current state of the environment reward – the just received reward nextState – the state of the environment after performing the

action

action – the just performed action episodeOver – True if this was the last step in this episode

updateVar(delta_t)[source]

Updated the running variance based on the td-error.

Parameters:
delta_t – the current temporal difference error returns – the new value for the running variance

pyrltr.agents.NeuralNetworkAgent module¶

Created on Tue Jun 25 11:56:39 2013

@author: Chris Stahlhut

class pyrltr.agents.NeuralNetworkAgent.NeuralNetworkAgent[source]
getQValues(state)[source]
initController()[source]

Initialize the controller. In this case, it is a state-action-table.

reset()[source]
scaleReward(reward)[source]
scaleState(state)[source]
selectAction(state)[source]

Epsilon-greedy action selection for the state

updateReward(state, reward, nextState, action, nextAction, episodeOver)[source]

Updates the reward for the current action by considering the reward and next action.

pyrltr.agents.StateActionTableLearner module¶

class pyrltr.agents.StateActionTableLearner.StateActionTableLearner(alpha=0.3, gamma=0.1, epsilon=0.5, N=1)[source]
getDataFolderName()[source]
initController()[source]

Initialize the controller. In this case, it is a state-action-table.

reset()[source]
selectAction(state)[source]

Epsilon-greedy action selection for the state

updateReward(state, reward, nextState, action, nextAction, episodeOver)[source]

Updates the reward for the current action by considering the reward and next action.

writeResults(writer)[source]

pyrltr.agents.Teacher module¶

class pyrltr.agents.Teacher.Teacher(world)[source]
isBetter(oldState, newState)[source]