pyrltr.worlds package

Submodules

pyrltr.worlds.ArmSimulation module

class pyrltr.worlds.ArmSimulation.SingleDoF[source]

Bases: pyrltr.worlds.BaseWorld.BaseWorld

calculateDistanceToTarget(position, goalPosition)[source]

Calculates the angular distance to the target.

getRandomPosition()[source]

Returns a random position in the interval [-pi, pi).

getSensors()[source]
getStateQuality(state)[source]
targetReached(distance)[source]

pyrltr.worlds.BaseWorld module

class pyrltr.worlds.BaseWorld.BaseWorld(maxEpisodeLength)[source]

BaseWorld implements all basic parts of the reinforcement learning environment. It in itself is not directly usable.

Attributes:
randomState – necessary for randomization; necessary, because
np.random.*distribution* becomes deterministic after initialization
initialAgentTrainingPositions – initial positions to be used in
training for the agent
initialGoalTrainingPositions – initial positions to be used in
training for the goal
initialAgentTestingPositions – initial positions to be used in testing
for the agent
initialGoalTestingPositions – initial positions to be used in testing
for the goal
initialTrainingDistances – the initial distances in the training
episodes
initialTestingDistances – the initial distances in the testing
episodes

agentUndoStack – allows the agent to undo the last action

maxEpisodeLength – number of steps until an episode is over

createInitialPositions(n)[source]

Crates n initial positions for the agent and goal. Makes sure, that the target is not reached int these configurations.

Parameters:
n – the number of initial positions to create
episodeOver()[source]
getDistanceInActionSpace()[source]

Returns the distance in action space.

getDistanceToTarget()[source]
getReward()[source]
getStateQuality(state)[source]

Returns inverse distance to be used as quality measurement.

performAction(action)[source]
reset()[source]

Resets the environment. Contains only the parts, which are necessary after every episodes, regardless of whether is was a training or testing episode.

resetFinalTestingEpoch()[source]

Resets the final testing positions to be used in the next epoch.

resetTestingEpoch()[source]

Resets the testing positions to be used in the next epoch.

resetTrainingEpoch()[source]

Resets the training positions to be used in the next epoch.

setupTestingEpisode()[source]

Prepares the environment for a testing episode.

setupTrainingEpisode()[source]

Prepares the environment for a training episode.

stepsExceeded()[source]
targetReached(distance)[source]
undo()[source]

Undoes the last action.

useFinalTestingPositions(agentPositions, goalPositions)[source]

Sets the initial positions for the final testing episodes which are called after learning.

Parameters:
agentPositions – the initial positions for the agent goalPositions – the initial positions for the goal
useInitialTestingPositions(agentPositions, goalPositions)[source]

Sets the initial positions for the testing episodes.

Parameters:
agentPositions – the initial positions for the agent goalPositions – the initial positions for the goal
useInitialTrainingPositions(agentPositions, goalPositions)[source]

Sets the initial positions for the training episodes.

Parameters:
agentPositions – the initial positions for the agent goalPositions – the initial positions for the goal

pyrltr.worlds.ForwardKinematics module

class pyrltr.worlds.ForwardKinematics.ForwardKinematics(joints, linkLengths)[source]

Implements the forward kinematic of a three dimensional robot arm.

Attributes:

joints – The legal joints in the form (axis, ranges), ranges = (lowerValues, higherValues). linkLengths – the lengths of the links, i.e. the translations

between the coordinate systems.
getRandomPosition()[source]

Returns a random position in base (3d) and joint coordinates. The positions are drawn uniformly from the joint space.

return – a radomly drawn position in the form
(base coordinate, joint coordinates)
identity(sinTheta, cosTheta, transpose)[source]

No rotation at all.

inBaseCoordinateSystem(angles, point)[source]

Returns the point given in base coordinates using the given angles.

Parameters:
angles – the angles to use for the transformations, cannot be
outside the specified ranges

point – to transform given in world coordinates returns – the point in base coordinates in 3d

inEffCoordinateSystem(angles, point)[source]

Returns the point in end-effector coordinates using the given angles.

Parameters:
angles – the angles to use for the transformations, cannot be
outside the specified ranges

point – to transform given in world coordinates returns – the point in end-effector coordinates in 3d

rotationMatrices = {'y': <function yRotation at 0xa5f0d84>, 'x': <function xRotation at 0xa5f0d4c>, 'z': <function zRotation at 0xa5f0dbc>, 'i': <function identity at 0xa5f0d14>}
validateAngles(angles)[source]

Limits the angles to the valid ones specified in the list of links for the constructor.

xRotation(sinTheta, cosTheta, transpose)[source]

“Rotation matrix around the x-axis: sinTheta – sine of the rotation angle theta. cosTheta – cos of the rotation angle theta. transpose – False if the rotation axis shall not be transposed.

yRotation(sinTheta, cosTheta, transpose)[source]

“Rotation matrix around the y-axis: sinTheta – sine of the rotation angle theta. cosTheta – cos of the rotation angle theta. transpose – False if the rotation axis shall not be transposed.

zRotation(sinTheta, cosTheta, transpose)[source]

“Rotation matrix around the z-axis: sinTheta – sine of the rotation angle theta. cosTheta – cos of the rotation angle theta. transpose – False if the rotation axis shall not be transposed.

pyrltr.worlds.FunctionPlotter module

pyrltr.worlds.GridWorld module

class pyrltr.worlds.GridWorld.ContinuousGridWorld(size=array([5, 5]), agentPosition=array([ 0., 0.]), goalPosition=array([2, 2]))[source]

Bases: pyrltr.worlds.GridWorld.GridWorldNeuralNetwork

Implementation with continuous state and action space.

getRandomPosition()[source]

Returns a continuous random position within the worlds limitations.

getSensors()[source]
hitObstacle()[source]

Returns True, if the agentPosition is within a distance of 1 the obstacle.

reset()[source]
targetReached(distance)[source]

Returns True, if the distance to the target is < 1.

class pyrltr.worlds.GridWorld.GridWorld(size=array([5, 5]), agentPosition=array([ 0., 0.]), goalPosition=array([2, 2]))[source]

Bases: pyrltr.worlds.BaseWorld.BaseWorld

Implmentation of a GridWorld. Abstract class, please use concrete implementation.

calculateDistanceToTarget(position, goalPosition)[source]
getRandomPosition()[source]

Returns a random position within the world.

isWithinWorld(point)[source]
class pyrltr.worlds.GridWorld.GridWorldNNMovingTarget(size=array([5, 5]), agentPosition=array([ 0., 0.]), goalPosition=array([2, 2]))[source]

Bases: pyrltr.worlds.GridWorld.GridWorldNeuralNetwork

Implementation where the target position moves in between runs and is feed back as sensor.

getSensors()[source]
reset()[source]
scale(position)[source]
class pyrltr.worlds.GridWorld.GridWorldNeuralNetwork(size=array([5, 5]), agentPosition=array([ 0., 0.]), goalPosition=array([2, 2]))[source]

Bases: pyrltr.worlds.GridWorld.GridWorld

Implementation to use with neural networks. Returns the position as an array. Basically uses a continuous state space.

getSensors()[source]
scale(position)[source]

Scales the given position down to the world size.

class pyrltr.worlds.GridWorld.GridWorldStateActionTable(size=array([5, 5]), agentPosition=array([ 0., 0.]), goalPosition=array([2, 2]))[source]

Bases: pyrltr.worlds.GridWorld.GridWorld

Implementation to use with a state action table. Returns the position as an index within a list.

getSensors()[source]
positionAddition = array([[ 1, 0], [-1, 0], [ 0, 1], [ 0, -1], [ 0, 0]])

pyrltr.worlds.InverseKinematics module

class pyrltr.worlds.InverseKinematics.TwoDoF[source]

Bases: pyrltr.worlds.BaseWorld.BaseWorld

A two DoF arm of which the learning is supposed to learn the inverse kinematics.

calculateDistanceToTarget(position, goalPosition)[source]

Calculates the distance to the target as Euclidean distance in joint space.

Parameters:
position – point A to calculate the distance goalPosition – point B to calculate the distance returns – the Euclidean distance between the two points
getDistanceInActionSpace()[source]

Returns the distance in action space between the current arm’s position and the goal.

getKinematics()[source]

Returns the kinematic object for this environment.

getReward()[source]

Returns the reward as -distance.

getSensors()[source]

Returns the current position which is the agent’s position in joint space.

targetReached(distance)[source]
class pyrltr.worlds.InverseKinematics.TwoDoFReduntant[source]

Bases: pyrltr.worlds.InverseKinematics.TwoDoF

This class models a redundant two DoF robotic arm.

getKinematics()[source]

Returns the kinematic object for this environment.

pyrltr.worlds.NumericalSimulator module

Numerical simulation environments for single and two DoF robotic arm.

class pyrltr.worlds.NumericalSimulator.Nimbro[source]

Bases: pyrltr.worlds.NumericalSimulator.TwoDoF

Implementation with a standard Nimbro arm. Uses 3 DoF in 3 dimensions. Attributes:

maxStepNumber: maximum number of steps to complete a circle maxStepLength: angular step length goalZoneRadius: the radius of the goal zone agentPosition: the position of the agent in joint space goalPosition: the position of the goal in the base coordinate system.
calculateDistanceToTarget(position, goalPosition)[source]

Calculates the distance to the target as Euclidean distance in joint space.

Parameters:
position – point A to calculate the distance goalPosition – point B to calculate the distance returns – the Euclidean distance between the two points
getDistanceInActionSpace()[source]

Returns the distance in action space between the current arm’s position and the goal.

getSensors()[source]

Returns the current position in joint space.

getStateQuality(state)[source]
class pyrltr.worlds.NumericalSimulator.SingleDoF[source]

Bases: pyrltr.worlds.ArmSimulation.SingleDoF

Arm with 1 DoF.

Attributes:
maxStepNumber: maximum number of steps to complete one rotation maxStepLengths: the angular length of one step agentPosition: the agent’s position in joint space goalPosition: the goal’s position in joint space.
goalZoneRadius = 0.034641016151377546
maxStepLength = 0.7853981633974483
maxStepNumber = 8
targetReached(distance)[source]

Returns True if the distance is smaller than the goal zone’s radius.

class pyrltr.worlds.NumericalSimulator.ThreeDoF[source]

Bases: pyrltr.worlds.NumericalSimulator.TwoDoF

Arm with 3 DoF.

Attributes:

maxStepNumber: maximum number of steps to complete a circle maxStepLength: angular step length goalZoneRadius: the radius of the goal zone agentPosition: the position of the agent in joint space goalPosition: the position of the goal in the base coordinate

system.
getStateQuality(state)[source]

Returns the quality of a state as the inverse of the distance. The closer the arm gets to its goal the better the state. Parameters:

state – the state to evaluate returns – the quality of said state
goalZoneRadius = 0.12
maxStepLength = 0.7853981633974483
maxStepNumber = 8
class pyrltr.worlds.NumericalSimulator.TwoDoF[source]

Bases: pyrltr.worlds.BaseWorld.BaseWorld

Arm with 2 DoF.

Attributes:
maxStepNumber: maximum number of steps to complete a circle maxStepLength: angular step length goalZoneRadius: the radius of the goal zone agentPosition: the position of the agent in joint space goalPosition: the position of the goal in the base coordinate system.
calculateDistanceToTarget(position, goalPosition)[source]

Calculates the distance to the target as Euclidean distance in joint space.

Parameters:
position – point A to calculate the distance goalPosition – point B to calculate the distance returns – the Euclidean distance between the two points
getDistanceInActionSpace()[source]

Returns the distance in action space between the current arm’s position and the goal.

getSensors()[source]

Returns the current position in joint space.

getStateQuality(state)[source]

Returns the quality of a state as the inverse of the distance. The closer the arm gets to its goal the better the state. Parameters:

state – the state to evaluate returns – the quality of said state
goalZoneRadius = 0.12
maxStepLength = 0.7853981633974483
maxStepNumber = 8
targetReached(distance)[source]

pyrltr.worlds.Vrep module

pyrltr.worlds.WorldFactory module

class pyrltr.worlds.WorldFactory.WorldFactory[source]

Creates, stores and reads world objects.

Attributes:

NUMERICAL_ONE_DOF – filename for the numerical one DoF world NUMERICAL_TWO_DOF – filename for the numerical two DoF world NUMERICAL_THREE_DOF – filename for the numerical three DoF world NUMERICAL_NIMBRO – filename for the numerical nimbro world VREP_ONE_DOF – filename for the v-rep one DoF world VREP_TWO_DOF – filename for the v-rep two DoF world

AGENT_TRAINING_POSITIONS – training positions for the agent GOAL_TRAINING_POSITIONS – training positions for the goal

AGENT_TESTING_POSITIONS – testing positions for the agent GOAL_TESTING_POSITIONS – testing positions for the goal

AGENT_TESTING_POSITIONS = 'agentTestingPositions.dat.gz'
AGENT_TRAINING_POSITIONS = 'agentTrainingPositions.dat.gz'
FINAL_AGENT_TESTING_POSITIONS = 'final_agentTestingPositions.dat.gz'
FINAL_GOAL_TESTING_POSITIONS = 'final_goalTestingPositions.dat.gz'
GOAL_TESTING_POSITIONS = 'goalTestingPositions.dat.gz'
GOAL_TRAINING_POSITIONS = 'goalTrainingPositions.dat.gz'
INVERSE_TWO_DOF = 'inversekinematics.twodof'
INVERSE_TWO_DOF_REDUNDANT = 'inversekinematics.twodofredundant'
NUMERICAL_NIMBRO = 'numerical.nimbro'
NUMERICAL_ONE_DOF = 'numerical.onedof'
NUMERICAL_THREE_DOF = 'numerical.threedof'
NUMERICAL_TWO_DOF = 'numerical.twodof'
VREP_ONE_DOF = 'vrep.onedof'
VREP_TWO_DOF = 'vrep.twodof'
createWorld(name, path, numberOfTrainingPositions, numberOfTestingPositions, numberOfFinalTestingPositions)[source]

Creates the world and if necessary, creates the initial positions as well.

Parameters:
name – the name of the world, i.e. numerical.onedof or
numerical.twodof. See class attributes for possible values

path – the path to look in for the files numberOfTrainingPositions – the number of positions to use for

training
numberOfTestingPositionss – the number of positions to use for
testing
numberOfFInalTestingPositions – the number of final positions to
use after the learning is completed

Module contents