Learning:

At first, we limited our learning algorithm to Q reinforcement learning. Yingqiang Lin generously allowed the use and modification of his Maze program, which simulated Q-Lambda learning in a maze with obstacles on the UNIX platform.

His program was modified into maze.cpp. A test maze of 16 by 16 cells (maze9) was created and Lin's program was modified to run against this maze. The modified program directs the robot to find the goal ten consecutive times employing Q-Lambda learning and compiles data about each trip. Ten trips to the goal were chosen because it took about ten efforts for Lin's Q-Lambda learning algorithm to "find" the shortest route to the goal. The cumulative, minimum, and maximum of the total number of cells visited during each of the robot's efforts to reach the goal were recorded on each test of ten trips. The results from 100 of these tests (1000 robot trips to the goal) were compiled into the following graphs.

The first graph compares the robot's speed in inches per second to

the number of cells the robot is capable of visiting per minute.

The second graph compares the system loop timing to the number of cells visited per minute.

The third graph compares the number of cells visited per test to the duration of the test in minutes.

Thus, the Q learning algorithm, which performs slower than the Q-Lambda algorithm, was deemed too slow for our purposes. It was decided, therefore, to employ the Q-Lambda algorithm in our learning module, because Q-Lambda will converge on the optimum route from the start to the goal quicker than the Q learning algorithm.

The Q-Lambda learning algorithm describes an *agent* (the robot) learning over time through interaction with an *environment* (the maze). With each *action* the agent takes, the agent finds itself in a new *state* with a set of available actions, these may be described as (*state, action*) pairs. Reinforcement is given when the agent reaches the goal state, at which time, the agent updates its *policy* with regard to each (*state, action*) pair - *Q*(*state, action*). The agent's objective is to optimize its policy to maximize the total reinforcement received over time. Thus, the agent learns through experience.

Lin's program, incorporating the Q-Lambda learning algorithm was consequently modified to produce a return value which, when implemented, made the decision about the robot's next movement.

This final modification became the learning module and was then incorporated into the Interface. The learning module makes all decisions concerning the robot's movement during testing runs from the start to the goal.

Introduction | Overview | Maze | Interface | Learning | Vision | Robot | Integration | Bibliography