Learning movement sequences with a delayed reward signal in a hierarchical model of motor function.
Stringer SM., Rolls ET., Taylor P.
A key problem in reinforcement learning is how an animal is able to learn a sequence of movements when the reward signal only occurs at the end of the sequence. We describe how a hierarchical dynamical model of motor function is able to solve the problem of delayed reward in learning movement sequences using associative (Hebbian) learning. At the lowest level, the motor system encodes simple movements or primitives, while at higher levels the system encodes sequences of primitives. During training, the network is able to learn a high level motor program composed of a specific temporal sequence of motor primitives. The network is able to achieve this despite the fact that the reward signal, which indicates whether or not the desired motor program has been performed correctly, is received only at the end of each trial during learning. Use of a continuous attractor network in the architecture enables the network to generate the motor outputs required to produce the continuous movements necessary to implement the motor sequence.