Imagery and Mental Simulation

A neural network model for acquiring neonatal facial imitation

Body image is achieved by acquiring a self-body schema in the brain. It can be supposed that the body schema of faces is privileged among all other ones; since, in spite of the fact that newborns cannot visually observe their own faces, they have the ability to imitate adults’ facial movements. This interesting behavior of newborns leads us to consider that they have the visual schema of the face. How do newborns acquire such a schema? The fact that neonates only nine minutes old preferentially look at faces suggests that they have already acquired the visual schema of the face. However, the mechanism for learning of the schema of faces during the fetal period is still unknown. Based on the hypothesis that there is a connection between the tactile and visual areas in newborns, we proposed a neural network model that can acquire a visual schema of faces via “double-touch” between fingers and faces, which is frequently observed in fetuses without visual input [Song2009]

This model consists of three major functions: (1) The transmission of topological information on where is touched on the face to the visual area through the ventral intraparietal area (VIP) (location pashway), which is realized by a two-layered self-organizing map (SOM) with Gaussian-like local connections; (2) the coding of shape information with Gabor filters by “double-touch” in the primary somatosensory area (shape pathway); and (3) the integration of two kinds of information (i.e., topological and shape) using a Hebbian connection between the primary somatosensory area (SI) and the visual area. The sum of Hebbian activation appears to be similar to the representation of faces in polar-coordinates. Thus, the acquired facial image forms a neural representation similar to the visual representation of faces without any visual input.

A developmental model of the predictive mechanism of body movements, reach movements, and pointing

The prediction mechanism in the brain, which is necessary for mental imagery and mental simulation, plays an important role in body movement control. How does the ability to control body movements using such a prediction mechanism develop? We proposed a neural network model that learns forward and inverse sensorimotor transformations, and acquires reaching control using these transformations [Takemura 2009a in Japanese]. The network coded sensory inputs and motor commands in population coding with neural that have broad sensitivities; this was learned through motor babbling. This control mechanism is a basic part of the predictive motor control model (see Figure 2). This model can learn a network that performs both forward and inverse transformations only by observing the randomly moving hand. Even in the absence of targets during learning, when a novel visual position (not a hand position of the self) is input into the input layer of the inverse transformation pathway, the learned network simulated reaching toward targets located within and outside of reaching distance, using reciprocal connections for forward-inverse transformations [Takemura2009b in Japanese] [Takemura2009c] . The ability of the proposed model to perform reaching toward unreachable targets might be a basis for pointing and reaching behaviors that contain communicative intentions [Takemura2010a; Takemura2010b in Japanese].

Many studies have shown infants to be sensitive to situations in which their behavior is followed in time by a stimulus event. Hand-regard can be considered as a kind of attentional processes to the stimuli with high contingency. However, at approximately three months of age, the preferential target setting of the contingency detection device is “switched” from seeking out perfect response-stimulus contingencies toward a bias for high, but imperfect degrees of response-contingent stimulation. We discussed the relationships between this model and the changes in the densities of neuromodulators. Sumioka et al. (2010) proposed an algorithm to learn repertoires of basic movements by leaning movements with high contingency, based on the calculation of the degree of salience of contingency.

Investigation of the brain mechanism involved in the predictive control and imagery of hands and objects

As we noted above, object permanence is related to the development of functions of imagery and movement prediction. This prediction mechanism can be assumed to be developed along with the acquiring of goal-directed movements (reach-to-grasp and reach-to grasp movements) during infancy.
Based on this background, [Ogawa2006 in Japanese] [Ogawa2007] used functional magnetic resonance imaging to investigate differences in brain activity for the internal monitoring of self-generated vs. externally generated movements during visual occlusion. Participants tracked a sinusoidally moving target with a mouse cursor. In some trials, the vision of either the target’s (externally generated) or the cursor’s (self-generated) movement was transiently occluded, during which time subjects continued tracking by estimating the current position of either the invisible target or the cursor on the screen. Analyses revealed that both occlusion conditions were associated with increased activity in the presupplementary motor area relative to a control condition with no occlusion. Moreover, the right and left posterior parietal cortices (PPC) showed greater activation during occlusion of the target’s and cursor’s movements, respectively. This study suggests that the presupplementary motor area (pre-SMA) is involved in the internal imagery of visual movements that are not dependent upon both internally and externally generated movements, and that lateralization of the PPC is associated with the internal monitoring of internally vs. externally generated movements. These results are fully consistent with the findings of our previous imaging studies on the imitation of arms and fingers and the imagery of tool-use, as well as with of the results of our behavioral and simulation experiments on reach-to-grasp movements (See Fig. 1).

Fig. 1 Activated regions observed in direct comparison between target occlusion and cursor occlusion during occlusion period of tracking task [Ogawa2007].

A model of predictive control of body movements and mental simulation

When the visual information is occluded during a reach-to-grasp movement, the grip aperture becomes larger. Although reach-to-grasp movements are online-controlled, the output motor command that controls grip aperture is transmitted to the end effector (hand) with delay and noise, and this changes the grip aperture. At the same time, the efference copy of the motor command is transmitted to the State Estimator without delay, and the body state (grip aperture) is estimated (or predicted) by the forward model. The estimated aperture is compared to the sensory feedback derived from the vision and proprioception of the actual grip aperture; then, the estimation error is used to correct the body state estimation. The next motor command is generated based on the estimated (predicted) body state and the variability (variance) of its estimation error. The target object’s size and variability (variance) that are observed by vision are used to calculate the motor command.

We have proposed a computational model for a reach-to-grasp movement, where the state of the hand is estimated and predicted by Kalman filters. In a stochastic manner, a motor command is generated that establishes a target grip aperture that is sufficiently large in relation to the target object’s size. The simulations of the model reproduced the effect of visual occlusion during grasping. Online control of the movement would therefore require the internal prediction of the future states of the body, its variability, and the generation of its motor commands based on these prediction and task constraints. Our results suggest that the predictive control mechanism plays an important role in body movement control.

When subjects track a moving point and trace a curved line, a motor command is generated in the motor area and cerebellum by comparing the current state of the effector and the state of the target. The motor command is transmitted to the effector and generates actual movements. At the same time, the efference copy of the motor command is transmitted to the parietal cortex. State estimation of the effector is performed using this motor command and the estimation of the current state of the effector as inputs. Ogawa and Inui’s (2007) findings suggest that the left inferior parietal lobule (IPL) is involved in this estimation of the current state of the effector and state estimation based on proprioceptive feedback from the somatosensory area. Our previous studies suggested that the right intraparietal sulcus (IPS) is involved in the evaluation of prediction errors between this state estimation and visual feedback, while the right temporo-parietal junction (TPJ) is involved in the integration of visual feedback errors and state estimation. When the externally generated movements are estimated, the right superior and inferior parietal lobules are related to the prediction and estimation of movements via past visuo-spatial information. The pre-SMA is related to the imagery of visual movements, regardless of the movements’ agents [Ogawa2007].

Fig. 2  Brain mechanism of state prediction for motor control.

Figure 2  shows the relationship between the predictive control mechanism, which estimates the current body state using a motor command, and other motor control mechanisms, along with the results of imaging studies using a tracking task.


Nabeshima and Kuniyoshi (2004) hypothesized that visuo-tactile integration is important for acquiring the body image suggested by the phenomenon called “rubber hand illusion.” Based on their hypothesis, they proposed a model of mutual association between the visual information just before the hand-object contact and the tactile information caused by the contact. Since tactile information is input for a very short period, this model learns spatio-temporal associative memories separately, which permits the mutual association of visual and tactile information. Nabeshima et al. (2006) extended this model; they proposed an algorithm that would make robots use tools. This model consists of a kinematic controller that outputs motor commands from the visuo-tactile mutual associative memory, the efference copy of a target’s position, and joint angles as inputs. We supposed that the multi-sensory associative memory on the way of the tool-object contact exists in the supramarginal gyrus, located near the left IPS. It can also be suggested that the mutual associative memory pertaining to self-body and spatial information exists in the right TPJ, which is the basis for the whole body’s schema.