Simulation of cognitive expectations with a connectionist model of tonal knowledge activation

To simulate a representation of listeners’ tonal knowledge and its role for automatic, cognitive expectancy formation in tonal music, Bharucha proposed a connectionist model (MUSACT) whose structure can be learned by exposure to musical material and can account for a range of empirical findings in music perception (Bharucha, 1987; Tillmann et al., 2000). The MUSACT model is a network of three interconnected layers corresponding respectively to tones, chords, and keys. Tone units are connected to the chords to which they belong, and chord units are connected to the keys to which they belong. The knowledge of Western tonal music regularities is not stored explicitly, but emerges from spreading activation and reverberation in the network. The tones of a musical piece activate the corresponding tone units, and the activation spreads to related chord and key units. Activation then reverberates from key units to chords and tones, and this process continues until equilibrium is reached. MUSACT incorporates a temporal memory decay (to account for short-term memory) so that an event’s activation can depend on an entire sequence of events, weighted according to recency. The total activation, a i,e ,of a unit i (a tone, a chord, or a key) is the sum of three activation components: 1) the bottom-up activation caused directly by the event e, 2) the indirect activation received from other units in response to event e (i.e. the phasic activation spreading in the system), and 3) the decayed activation caused by previous events (see Bharucha, 1987, and Tillmann et al., 2000 for more details).

This connectionist model has simulated harmonic priming data (e.g., Bharucha, 1987; Bigand et al., 1999; Tillmann et al., 2000). The prime context is presented to the model and the activation pattern for chord units representing the targets are read out: a higher activation level is interpreted as stronger expectation for this chord to occur next and predicts faster response times. The model succeeded in simulating response time patterns for tonic versus out-of-key as well as for tonic versus subdominant target chords (e.g., Bharucha, 1987; Bigand et al., 1999; Tillmann & Bigand, 2001; Bigand et al., 2003). The differences in activation pattern, which reflect differences in tonal stabilities and relations between the chords, arise from activation spreading in the network, taking advantage of top-down activation from the key units.

For our present study, we ran simulations with the melodic sequences. Simulations were run for all sequences with an implementation of Bharucha’s model in MATLAB (see Bigand et al., 1999). Parameters for the memory decay were d = .2 and t = 1. In order to account for the differences in the tones’ duration, the input vector of each tone was multiplied by a constant to reflect multiples of the shortest duration (i.e., a sixteenth note). For each melody, the prime contexts (i.e., melodies without targets) were presented to the model and the relative activation of tone units representing the target tones were read out. Activation values of tone units representing the target tones were compared between the related and the less-related conditions of the 12 melodic pairs with a bilateral paired t-test. Target tone units were significantly more activated in the related than in the less related conditions (t(11) = 2.55; p < .05).

This difference in activations of target tone units shows that our priming data can be accounted by a purely cognitive model, which represents activation of tonal knowledge and does not take acoustical information into account. Because activation reverberates from key to chord units, and from chord to tones units, the network can account for top-down influences of the key context on perception and expectation for individual tones. It is important to note that the model is coding pitch classes only and thus does not include pitch height or contour information. In addition, pitch classes are coded by a sparse input coding: a tone unit is activated if the tone to which it is tuned occurs in the stimulus, and is set to 0 otherwise. Consequently, in contrast to Leman’s model, the spectral richness of the sounds is not taken into consideration. In a proposed extension of MUSACT (Tillmann et al., 2000), a model based on a richer input coding, which includes harmonic and subharmonic virtual pitches (based on the psychoacoustic model of Parncutt, 1988), resulted in activation patterns that highly correlated with those of the sparse input coding model. After reverberation, the model’s activation pattern does not reflect the difference in the sensory richness of the input: the top-down processes driven by the more abstract learned knowledge of tonal relations impose a pattern of activation that is similar regardless of the richness of the input stimulation. The simulations with this connectionist model thus mimic listeners’ behavior in our priming experiments – the difference between tonal and less-related conditions emerge independently of the chosen sound complexity. The MUSACT model thus appears to have a larger scope in accounting for tonal expectations than Leman’s auditory model10.

Notes
10.

Following the criteria of Cutting, Bruno, Brady, and Moore (1992) to judge the quality of a cognitive model, “scope” refers to the degree to which a theory accounts for a broad range of experimental data elicited in a variety of contexts (see also Pearce & Wiggins, 2006).