3.4. Simulation vs. reasoning accounts of action understanding

The two main results of the present study (interaction between prior expectations and perceptual information and the modulation of this interaction as a function of the type of intention) may help reconcile the two major accounts of action understanding developed over the last decade. On the simulation account, we understand our conspecifics’ intention by literally simulating their action via the activation of our own motor planning system; the result of this process of internal replication is the selection, in the observer’s own repertoire, of the intention that would have caused the very same action. This type of explanation mostly stresses the role of sensory information derived from the kinematics of the action (Rizzolatti et al., 2004). By contrast, the “theory theory” account postulates that action understanding is based on specialized inferential processes and mostly emphasizes the contribution of the context-related prior knowledge derived from our intuitive theories of human behaviour, on the one hand, and on the subject’s past experiences and rules she has drawn from them, on the other (Leslie, 1987; Gopnik & Wellman, 1992, 1994).

A wealth of empirical data and theoretical works nowadays converges on the idea that these two major classes of mechanisms play a complementary role in intention inference(Mitchell, 2005; Keysers & Gazzola, 2007; Brass, Schmitt, Spengler, & Gergely, 2007; de Lange et al., 2008). The results of the present study comfort these observations. By suggesting that intentional judgment relies on a relative balance of bottom-up sensory and top-down prior information, they plead in favor of a hybrid model of action understanding. In such a model, the observer would mobilize either low-level simulation or high-order inferential mechanisms depending on whether the current sensory evidence is, or is not, reliable enough to elicit simulation from observation.

Recently, Kilner and colleagues proposed a theoretical framework that attempts to further account for how these two classes of mechanisms may interact to enable one’s understanding of other people’s intentions. This framework relies on the hierarchical architecture of action representations ranging from the intention level to the kinematics level (see also Grafton & Hamilton, 2007). In this architecture, the selection of one type of action representation would result from the resolution of the inverse problem at each level of the hierarchy. Basically, each level uses a model to generate a prediction of the representations in the level below. This prediction is then compared with the representation at the subordinate level and prediction errors arising from that comparison are returned to the higher level to adjust its representation. This adjustment is generalised to the different levels of the hierarchy (intention, motor command and kinematics). The most likely cause of the observed action is then inferred by minimising the prediction error at all the levels of this hierarchy (Kilner et al., 2007a, 2007b). Given visual kinematics, goal expectations are first generated, from these goal representations motor commands are then predicted and given these motor commands, kinematics are in turn predicted. In this framework, top-down influences are therefore dynamically generated since the estimates produced at the higher levels become prior expectations for the lower levels.

Our results can be consistently interpreted in the light of the Kilner’s hierarchical model. A motor intention can be directly predicted from the observation of the current motor act, provided the related visual information is sufficient to enable comparison with expected kinematics at higher levels. In this case, participants’ performance is strongly dependent on minimising the prediction error that arises from this comparison. However, this comparison also closely depends on the reliability of the current movement kinematics; when the amount of visual information is too low, this comparison cannot be made, and, as a result, subordinate levels cannot adjust their representation to higher estimates of the hierarchy. We observed that, when this comparison could not be carried out, participants consistently appealed to their prior knowledge. In a hierarchical model of action representations, such an over-reliance on priors could be made possible by the existence of a short circuitry of recursive loops between subordinate and higher levels of the cortical hierarchy. These recursive loops would be mobilized when data is sparse to shortcut the automatic comparison process between observed and expected kinematics movement. Importantly, the engagement of this mechanism proved to be dependent on the amount of visual information available from the action scene, but independent from the scope and target of the intention, since it was observed to operate at the lowest levels of visual information in each of the four experimental conditions.

Noteworthily, the engagement of these recursive loops is also sensitive to variations in the relationship between the observed action and its goal. Superordinate conditions indeed involved a more important recourse to participants’ prior expectations even when the visual information significantly increased to a moderate (non-social) or even a high level (social). This greater dependence on prior expectations can be explained by the fact that, in superordinate conditions, many competing intentions are congruent with the visual information conveyed by the current motor act. Thus, whereas minimising the prediction error between expected and current kinematics may be sufficient to predict the agent’s single act (e.g. to rotate), it may not be to infer unambiguously which of the multiple superordinate intentions (e.g. final shapes) it contributes to accomplish. As a consequence, we found the weight of the decision to be mostly carried by participants’ prior expectations, suggesting, in this situation of accrued perceptual uncertainty, an early shortcut of the comparison process between levels of the action representation hierarchy. Crucially, this shortcut was independent of the amount of information, since it occurred even when the visual information was high enough for the participant to be normally confident about what she is seeing. This observation suggests that recursive loops of this kind could be mostly recruited in contexts where relying on one’s prior expectations is a better guarantee for accurate inference, even if such expectations can occasionally go against perceptual evidence.